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Abstract 


The  problem  of  model-based  object  recognition  is  a  fundamental  one  in  the 
field  of  computer  vision,  and  represents  a  promising  direction  for  practical  appli¬ 
cations. 

We  describe  the  design,  analysis,  implementation  and  testing  of  a  system 
that  employs  geometric  hashing  techniques,  and  can  recognize  three-dimensional 
objects  from  two-dimensional  grayscale  images.  We  examine  the  exploitation  of 
parallelism  in  object  recognition,  and  analyze  the  performance  and  sensitivity 
of  the  geometric  hashing  method  in  the  presence  of  noise.  We  also  present  a 
Bayesian  interpretation  of  the  geometric  hashing  approach. 

Two  parallel  algorithms  are  outlined:  one  algorithm  is  designed  for  an  SIMD 
hypercube-based  machine  whereas  the  other  algorithm  is  more  general,  and  relies 
on  data  broadcast  capabilities.  The  first  of  the  two  algorithms  regards  geometric 
hashing  as  a  connectionist  algorithm.  The  second  algorithm  is  inspired  by  the 
method  of  inverse  indexing  for  data  retrieval. 

We  also  determine  the  expected  distribution  of  computed  invariants  over  the 
hash  space:  formulas  for  the  distributions  of  invariants  are  derived  for  the  cases 
of  rigid,  similarity  and  affine  transformations,  and  for  two  different  distributions 
(Gaussian  and  Uniform  over  a  disc)  of  point  features.  Formulas  describing  the  de¬ 
pendency  of  the  geometric  invariants  on  Gaussian  positional  error  are  also  derived 
for  the  similarity  and  affine  transformation  cases. 

Finally,  we  present  an  interpretation  of  geometric  hashing  that  allows  the 
geometric  hashing  algorithm  to  be  viewed  as  a  Bayesian  approach  to  model-based 
object  recognition.  This  interpretation  is  a  new  form  of  Bayesian-based  model 
matching,  and  leads  to  natural,  well-justified  formulas.  The  interpretation  also 
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provides  a  precise  weighted- voting  method  for  the  evidence-gathering  phase  of 
geometric  hashing. 

A  prototype  object  recognition  system  using  these  ideas  has  been  implemented 
on  a  CM- 2  Connection  Machine.  The  system  is  scalable  and  can  recognize  aircraft 
and  automobile  models  subjected  to  2D  rotation,  translation,  and  scale  changes 
in  real-world  digital  imagery.  This  system  is  the  first  of  its  kind  that  is  scalable, 
uses  large  databases,  can  handle  noisy  input  data,  works  rapidly  on  an  existing 
parallel  architecture,  and  exhibits  excellent  performance  with  real  world,  natural 
scenes. 


vm 


Contents 


1  Introduction  1 

1.1  Object  Recognition:  the  four  stages .  2 

1.1.1  Data  Acquisition .  2 

1.1.2  Feature  Extraction .  3 

1.1.3  Matching .  5 

1.1.4  Verification .  6 

1.2  The  Scope  of  this  Dissertation .  7 

1.2.1  Exploitation  of  Parallelism .  7 

1.2.2  Distributions  of  Invariants .  8 

1.2.3  Modeling  of  Noise .  8 

1.2.4  Bayesian  Interpretation .  8 

2  An  Introduction  to  Model-based  Object  Recognition  10 

2.1  A  Survey  of  the  Object  Recognition  Field .  11 

2.2  Indexing  Methods  .  18 

2.2.1  Geometric  Hashing  for  Model  Matching .  19 

2. 2. 1.1  The  Steps  Behind  the  Idea .  23 

2.3  Geometric  Hashing  Systems .  30 


IX 


3  Exploiting  Parallelism  35 

3.1  Parallelizability  of  Geometric  Hashing .  36 

3.2  Some  Definitions .  38 

3.3  Design  Issue .  40 

3.4  Building-block  Algorithms .  41 

3.4.1  The  p-product  .  41 

3.4.2  Histograming .  44 

3.4.2. 1  A  Novel  Radix-Sort  Algorithm .  46 

3.5  The  Geometric  Hashing  Connectionist  Algorithm .  46 

3.5.1  Connectionist  Algorithm:  Preprocessing  Phase .  48 

3.5.2  Connectionist  Algorithm:  Recognition  Phase .  52 

3.5.3  Time  Complexity .  54 

3.6  The  Hash-location  Broadcast  Algorithm .  56 

3.6.1  The  Data  Structure .  57 

3.6.2  Hash-location  Broadcast:  Preprocessing  Phase .  58 

3.6.3  Hash-location  Broadcast:  Recognition  Phase  .  61 

3.6.4  Time  Complexity .  64 

3.7  Implementation  Details  .  65 

3.8  Implementation  Results  /  Scalability .  69 

4  Distributions  of  Invariants  71 

4.1  Rigid  Transformation .  72 

4.2  Similarity  Transformation .  74 

4.3  Affine  Transformation .  75 


x 


5  Parallelism  Revisited  83 

5.1  Rehashing .  84 

5.2  Symmetries  and  Foldings  .  88 

5.3  Timing  Results .  93 

6  Noise  Modeling  96 

6.1  Performance  in  the  Presence  of  Noise .  98 

6.2  Modeling  Positional  Noise . 105 

7  Bayesian  Interpretation  112 

7.1  Abstract  Formulation  of  Geometric  Hashing . 113 

7.2  Updating  Formulas  and  Conditional  Independence . 123 

7.3  Reasoning  with  Parts . 125 

7.4  Conditional  Independence . 127 

7.5  Density  Functions  . 133 

7.6  Bayesian  Geometric  Hashing  . 137 

7.7  Exact  versus  Approximate  Matching . 140 

7.8  False  Alarm  Rates . 144 

7.9  The  Formulas:  Exact  Matching . 146 

7.10  The  Formulas:  Approximate  Matching . 154 

8  Experimental  Results  160 

8.1  Off-Line  Preprocessing . 160 

8.2  The  Two-level  Randomized  Algorithm . 163 

8.3  Results . 168 


xi 


9 


Conclusion 


184 


9.1  Summary  of  Results . 184 

9.2  Future  Research  Directions . 187 

A  Some  Details  Regarding  the  Derivation  of  Eqn.  6.6  188 


xii 


List  of  Tables 


2.1  The  time  complexities  for  the  preprocessing  and  recognition  phases, 

for  several  transformations .  29 

8.1  The  thirty-two  models  of  the  database . 162 


xiii 


List  of  Figures 

2.1  Model  Mi  consisting  of  five  points .  19 

2.2  Determining  the  hash  table  entries  when  points  4  and  1  are  used 

to  define  a  basis.  The  models  are  allowed  to  undergo  rotation, 
translation  and  scaling .  20 

2.3  The  locations  of  the  hash  table  entries  for  model  Mi.  Each  entry  is 

labeled  with  the  information  “model  Mi”  and  the  basis  pair  (i,j) 
that  was  used  to  generate  the  entry.  The  models  are  allowed  to 
undergo  rotation,  translation  and  scaling .  21 

2.4  Determining  the  hash  table  bins  that  are  to  be  notified  when  two 

arbitrary  image  points  are  selected  as  a  basis.  The  allowed  trans¬ 
formation  is  similarity .  22 

2.5  The  coordinate  system  defined  by  a  two-point  basis .  24 

2.6  The  coordinate  system  defined  by  a  three-point  basis .  28 

3.1  The  two  stages  of  the  parallel  p-product  computation  for  the  simple 

case  where  p=3  (3-product) .  43 

3.2  Simple  Radix  Sort  on  a  Hypercube .  47 

3.3  Radix  Sort:  an  illustration .  47 


xiv 


3.4  The  first  pass  of  the  preprocessing  phase  for  the  case  where  the 

basis  tuple  consists  of  two  points .  51 

3.5  The  third  stage  of  the  second  pass  of  the  preprocessing  phase  for 

the  case  where  the  basis  tuple  consists  of  two  points .  52 

3.6  The  recognition  phase  of  the  parallel  geometric  hashing  connec- 

tionist  algorithm,  for  the  case  where  the  basis  tuple  consists  of  two 
points.  Note  how  tokens  flow  from  one  set  via  connections  to  the 
next  set .  55 

3.7  Hash-location  broadcast  algorithm:  the  preprocessing  phase  for  the 

case  where  the  basis  tuple  consists  of  two  points .  60 

3.8  The  recognition  phase  of  the  parallel  hash-location  broadcast  algo¬ 
rithm,  for  the  case  where  the  basis  tuple  consists  of  two  points.  .  63 

3.9  Hash  bin  occupancy  for  a  typical  database;  the  height  is  propor¬ 
tional  to  the  length  of  the  corresponding  hash  bin  list .  67 

3.10  Average  time  required  for  a  single  basis  probe,  as  a  function  of  the 

number  of  processors  in  the  Connection  Machine.  The  database 
contains  1024  models  of  16  points  each,  and  the  scenes  contain  200 
points .  70 

4.1  The  distribution  over  the  space  of  invariants,  and  several  of  its 
contours  for  the  case  of  point  features  that  are  generated  by  the 
Gaussian  process  7V"(0,  (  q  °)).  The  allowed  transformation  is  rigid.  73 

4.2  The  distribution  over  the  space  of  invariants,  and  several  of  its  con¬ 

tours  for  the  case  of  point  features  that  are  uniformly  distributed 
over  the  unit  disc.  The  allowed  transformation  is  rigid .  74 


xv 


4.3  The  distribution  over  the  space  of  invariants,  and  several  of  its 

contours  for  the  case  of  point  features  that  are  generated  by  the 
Gaussian  process  7V"(0,  (  q  °)).  The  allowed  transformation  is  sim¬ 
ilarity .  75 

4.4  The  distribution  over  the  space  of  invariants,  and  several  of  its  con¬ 

tours  for  the  case  of  point  features  that  are  generated  by  the  Gaus¬ 
sian  process  7V"(0,(  q  °)).  The  allowed  transformation  is  affine.  .  78 

4.5  Correspondence  between  feature  and  hash  space  regions:  if  p  lies 

in  the  region  of  the  feature  space  marked  i,  the  computed  invariant 
tuple  will  he  in  the  region  i’  of  the  space  of  invariants .  79 

4.6  Several  of  the  contours  of  the  hash  table  distribution.  The  model 

features  are  uniformly  distributed  over  the  unit  disc,  and  the  al¬ 
lowed  transformation  is  affine .  81 

5.1  Hash  table  equalization  for  the  case  of  rigid  transformations  and 
point  features  generated  by  the  Gaussian  process  7V"(0,(  q  °)). 

Left:  the  expected  distribution  of  remapped  invariants.  Right: 
several  of  the  distribution’s  contours .  86 

5.2  Hash  table  equalization  for  the  case  of  rigid  transformations  and 

point  features  uniformly  distributed  over  the  unit  disc.  Left:  the 
expected  distribution  of  remapped  invariants.  Right:  several  of  the 
distribution’s  contours .  87 


xvi 


5.3 


Hash  table  equalization  for  the  case  of  similarity  transformations 
and  point  features  generated  by  the  Gaussian  process  J\f  (0,  (  q  °)). 

Left:  the  expected  distribution  of  remapped  invariants.  Right: 
several  of  the  distribution’s  contours .  88 


5.4  Hash  table  equalization  for  the  case  of  affine  transformations  and 
point  features  generated  by  the  Gaussian  process  A/"(0,(  q  °)). 

Left:  the  expected  distribution  of  remapped  invariants.  Right: 
several  of  the  distribution’s  contours .  89 

5.5  Symmetries  in  the  storage  pattern  of  the  hash  entries.  Top:  if 

no  rehashing  is  used,  the  hash  entries  are  symmetric  with  respect 
to  the  center  of  the  coordinate  system  of  the  space  of  invariants. 
Bottom:  when  rehashing  is  used,  the  rehashed  entries  have  the 
same  abscissa  but  are  distance  7 r  apart.  These  observations  hold 
true  for  both  the  rigid  and  similarity  transformations .  91 

5.6  Symmetries  in  the  storage  pattern  of  the  hash  entries  under  the 

affine  transformation.  Top:  if  no  rehashing  is  used,  the  hash  entries 
are  symmetric  with  respect  to  the  line  that  is  at  a  45  degree  angle 
with  the  horizontal  axis.  Bottom:  when  rehashing  is  used,  the 
rehashed  entries  have  the  same  abscissa  and  6  values  of  opposite 
signs .  92 


xvn 


5.7  Average  time  that  the  connectionist  algorithm  requires  for  a  sin¬ 
gle  basis  probe,  as  a  function  of  the  number  of  processors  in  the 
Connection  Machine.  The  database  contains  1024  models  of  16 
points  each;  the  points  have  been  generated  by  a  Gaussian  process 
and  the  scenes  contain  200  points.  The  allowed  transformation  is 
similarity  and  rehashing  is  used .  94 


6.1  Similarity  Transforms:  the  expected  percentage  of  model/basis 
combinations  receiving  exactly  k  votes.  Top:  the  models’  feature 
points  are  distributed  according  to  a  Gaussian  of  <7=1.  Bottom:  the 
models’  feature  points  are  distributed  uniformly  over  the  unit  disc. 

In  both  cases,  the  database  contained  512  models,  each  consisting 
of  16  points .  99 


6.2  Percentage  of  the  embedded  model’s  bases  receiving  k  votes  when 
used  as  probes,  for  different  amounts  of  Gaussian  noise.  The  mod¬ 
els  can  only  undergo  similarity  transformations.  Top:  the  models 
points  are  distributed  according  to  a  Gaussian  of  <7=1.  Bottom: 
the  models  points  are  distributed  uniformly  over  the  unit  disc.  In 
both  cases,  the  database  contained  512  models,  each  consisting  of 
16  points . 100 

xviii 


6.3  Regions  of  the  hash  table  that  would  need  to  be  accessed  in  the 
case  of  Gaussian  error  in  the  positions  of  the  point  features.  The 
models  are  allowed  to  undergo  a  similarity  transformation.  The 
left  graph  of  each  pair  shows  the  feature  space  domain,  whereas 
the  right  shows  the  space  of  invariants.  For  presentation  purposes, 

the  amount  of  Gaussian  error  was  deliberately  large . 103 

6.4  Regions  of  the  hash  table  that  would  need  to  be  accessed  in  the 

case  of  Gaussian  error  in  the  positions  of  the  point  features.  The 
transformation  class  is  affine  transformations.  The  left  graph  of 
each  pair  shows  the  feature  space  domain,  whereas  the  right  shows 
the  space  of  invariants.  For  presentation  purposes,  the  amount  of 
Gaussian  error  was  deliberately  large . 104 

7.1  The  preprocessing  phase:  for  each  model  and  for  every  N-tuple  of 

points  in  the  model,  a  hash  location  is  computed,  and  an  entry 
is  recorded  in  the  space  of  invariants  at  that  location.  The  entry 
is  tagged  with  the  information  concerning  the  model  identity  and 
model  features  that  were  used  to  compute  the  position . 117 

7.2  The  recognition  phase  and  voting  process  of  Bayesian  geometric 
:  hashing  using  N-tuples  of  image  features,  locations  in  the  space 
of  invariants  are  computed,  and  nearby  entries  are  accessed.  Each 
entry  is  tagged  with  a  model  number  and  a  set  of  model  features, 
which  can  be  paired  with  the  image  features  used  to  compute  the 
hash  location  to  form  a  candidate  interpretation.  Interpretations 

are  then  given  weighted  votes . 119 


xix 


7.3  The  probability  density  function  of  hashes  in  the  space  of  invariants 

which  are  generated  by  image  features  not  belonging  to  the  model 
that  is  embedded  in  the  image . 131 

7.4  The  probability  density  function  of  the  hashes  generated  by  the 

features  of  a  model  that  is  embedded  in  an  image.  In  this  example, 
n-c=6 . 132 

7.5  An  exact-matching  hypothesis  as  compared  to  an  approximate- 

matching  hypothesis.  Note  that  in  the  case  of  an  approximate- 
matching  hypothesis,  there  is  a  greater  range  of  uncertainty  in  the 
predicted  image  features  that  arise  as  a  result  of  the  remaining 
features  of  the  model . 142 

7.6  The  steps  for  a  probe  with  a  single  basis  set  during  the  recogni¬ 

tion  phase  of  the  Bayesian  geometric  hashing  algorithm  for  point 
pattern  recognition . 153 

8.1  The  edge  maps  and  the  selected  feature  points  for  the  database 

models  of  the  F-16  Falcon ,  the  Ford  Econolinel50}  and  the  Sea 
Harrier . 164 

8.2  Several  of  the  contours  for  the  effective  speedup  function.  The  hor¬ 
izontal  axis  corresponds  to  the  fraction  of  the  image  features  that 
is  considered  by  the  probe  selection  algorithm.  The  vertical  axis 
corresponds  to  the  least  number  of  model  features  that  one  expects 
to  see  in  the  selected  subset.  The  different  contours  correspond  to 
the  values  of  the  effective  speedups,  and  were  taken  at  heights  1.0, 

1.8,  2.0,  2.5,  3.0,  3.5,  4.0  and  4.36  respectively . 167 


xx 


8.3  A  test  image  for  the  recognition  algorithm:  the  photograph  of  an 

F-16 . 171 

8.4  The  edge  map  extracted  by  the  Cox-Boie  edge  detector  (the  value 

of  a  was  2.0)  for  the  F-16  test  image.  Also  shown  are  the  80 
automatically  extracted  features . 172 

8.5  Another  test  image:  the  photograph  of  a  Sea  Harrier.  The  airplane 

at  the  bottom  of  the  picture  is  a  Hunter  T-8M . 173 


8.6  The  edge  map  extracted  by  the  Cox-Boie  edge  detector  (the  value 
of  a  was  2.0)  for  the  Sea  Harrier  test  image.  Also  shown  are  the 


169  automatically  extracted  features . 174 

8.7  The  test  image  of  a  Ford  Econolinel50 . 175 


8.8  The  edge  map  extracted  by  the  Cox-Boie  edge  detector  (the  value 
of  a  was  3.4)  for  the  Ford  Econolinel50  test  image.  Also  shown 

are  the  98  automatically  extracted  features . 176 

8.9  The  output  of  the  implementation  of  our  system  on  the  Connection 

Machine.  The  test  input  (F-16)  is  shown  on  the  top  left.  The  edge 
map  together  with  the  automatically  extracted  point  features  is 
shown  on  the  top  right;  the  basis  selection  that  led  to  recognition 
is  also  marked.  A  total  of  22  basis  selections  was  required,  and 
the  elapsed  time  was  40.5  seconds  (NB.  this  figure  does  not  include 
the  edge  detection  and  feature  extraction  stages).  The  bars  above 
each  of  the  9  top  retrieved  models  provide  a  length  encoding  of 
the  total  accumulated  evidence  for  the  corresponding  model/basis 
combination.  The  retrieved  database  model  appropriately  scaled, 
rotated  and  translated  is  shown  overlaid  on  the  test  input . 178 


xxi 


8.10  The  F16  test  input  with  the  retrieved  model  overlaid  on  it.  The 
recovered  transformation  (rotation,  translation  and  scaling)  was 
based  solely  on  the  basis  pair,  and  not  on  a  best  least-squares 
match  of  all  corresponding  feature  pairs . 179 


8.11  The  output  of  the  implementation  of  our  system  on  the  Connection 
Machine.  The  test  input  (Sea  Harrier)  is  shown  on  the  top  left.  The 
edge  map  together  with  the  automatically  extracted  point  features 
is  shown  on  the  top  right;  the  basis  selection  that  led  to  recognition 
is  also  marked.  A  total  of  4  basis  selections  was  required,  and  the 
elapsed  time  was  15.7  seconds  (NB.  this  figure  does  not  include 
the  edge  detection  and  feature  extraction).  The  bars  above  each  of 
the  9  top  retrieved  models  provide  a  length  encoding  of  the  total 
accumulated  evidence  for  the  corresponding  model/basis  combina¬ 
tion.  The  retrieved  database  model  appropriately  scaled,  rotated 
and  translated  is  shown  overlaid  on  the  test  input . 180 


8.12  The  Sea  Harrier  test  input  with  the  retrieved  model  overlaid  on  it. 

The  recovered  transformation  (rotation,  translation  and  scaling) 
was  based  solely  on  the  basis  pair,  and  not  on  a  best  least-squares 
match  of  all  corresponding  feature  pairs . 181 


xxn 


8.13  The  output  of  the  implementation  of  our  system  on  the  Connec¬ 

tion  Machine.  The  test  input  (Ford  Econolinel50)  is  shown  on  the 
top  left.  The  edge  map  together  with  the  automatically  extracted 
point  features  is  shown  on  the  top  right;  the  basis  selection  that 
led  to  recognition  is  also  marked.  A  total  of  4  basis  selections  was 
required,  and  the  elapsed  time  was  9.1  seconds  (NB.  this  figure 
does  not  include  the  edge  detection  and  feature  extraction).  The 
bars  above  each  of  the  9  top  retrieved  models  provide  a  length 
encoding  of  the  total  accumulated  evidence  for  the  corresponding 
model/basis  combination.  The  retrieved  database  model  appropri¬ 
ately  scaled,  rotated  and  translated  is  shown  overlaid  on  the  test 
input . 

8.14  The  Ford  Econoline  150  test  input  with  the  retrieved  model  over- 


182 


laid  on  it.  The  recovered  transformation  (rotation,  translation  and 
scaling)  was  based  solely  on  the  basis  pair,  and  not  on  a  best  least- 
squares  match  of  all  corresponding  feature  pairs . 


183 


Chapter  1 


Introduction 


This  dissertation  addresses  the  problem  of  model-based  object  recognition  of 
three-dimensional  objects  in  two-dimensional  grayscale  images.  In  particular, 
we  describe  the  design,  analysis,  implementation  and  testing  of  a  recognition 
system  that  can  identify  three-dimensional  objects  from  real  world  grayscale  pho¬ 
tographs  using  a  database  of  stored  models.  The  models  in  the  image  can  be  rigid-, 
similarity-,  or  affine-transformed  versions  of  prototype  models  in  the  database. 

By  object  recognition  we  mean, 

•  the  recovery  of  the  imaged  object’s  identity,  and 

•  the  recovery  of  the  transformation  that  the  model  has  undergone. 

The  problem  of  object  recognition  is  a  fundamental  one  in  the  fields  of  com¬ 
puter  vision  and  robotics.  Perhaps  the  most  promising  research  direction  in  image 
analysis,  and  the  one  most  likely  to  lead  to  industrial  and  commercial  applica¬ 
tions,  is  the  area  of  object  recognition  where  the  search  is  confined  within  a  finite 
set  of  observable  models. 
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1.1  Object  Recognition:  the  four  stages 


In  all  of  the  object  recognition  systems,  one  can  distinguish  four  stages:  data 
acquisition,  feature  extraction,  matching,  and  verification.  We  examine  each  of 
these  stages  in  more  detail. 

1.1.1  Data  Acquisition 

Data  acquisition  is  carried  out  via  the  use  of  “sensors.”  Sensors  are  devices 
sensitive  to  a  variety  of  modalities.  Commonly  used  sensors  are  sensitive  to 
one  of  the  following:  X-rays,  visible  spectrum  light,  infrared,  microwaves,  and 
ultrasound.  Sensors  can  be  either  active,  in  which  case  they  emit  an  energy  beam 
and  subsequently  record  the  signal  that  is  returned  from  the  scene  objects,  or 
passive. 

The  output  of  a  sensor  is  typically  a  discrete  valued  function  and  results  from 
sampling  the  input  signal  at  regularly  spaced  intervals.1  The  spatial  pattern  of 
the  sample  points  is  called  a  “tessellation.”  For  two-dimensional  signals,  the 
rectangular  tessellation  has  been  used  in  the  overwhelming  majority  of  systems. 
Recently,  a  prototype  light  sensor  based  on  an  hexagonal  tessellation  has  made 
its  appearance  [96]. 

Independent  of  whether  the  tessellation  is  rectangular,  hexagonal,  or  triangu¬ 
lar,  regularly  tessellated  sensors  operating  at  rates  of  30  frames/sec  produce  large 
amounts  of  data  that  typically  cannot  be  processed  and  analyzed  in  real  time 
with  today’s  available  computing  power.  In  order  to  accommodate  the  problem, 
fovea-like  sensors  have  also  been  suggested  [9] .  The  resolution  of  a  foveated  sensor 

1  Actually,  the  recorded  value  is  not  the  value  of  the  function  at  the  location  where  the  sampling 
takes  place,  but  rather  the  function’s  integral  over  a  very  small  area  of  the  sensor. 
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is  not  constant  across  the  field  of  view,  but  instead  decreases  as  one  moves  from 
the  center  of  the  field  to  the  periphery,  similarly  to  the  human  fovea  [63].  An  im¬ 
mediate  result  of  such  a  tessellation  is  the  considerable  reduction  of  the  amount 
of  data  that  needs  to  be  processed,  to  the  point  that  real-time  vision  tasks  might 
be  feasible.  However,  such  a  tessellation  also  necessitates  the  development  of  new 
image  processing  algorithms,  a  not-so-straightforward  task  [100]. 

In  what  follows,  we  will  concern  ourselves  with  sensors  that  are  sensitive  to 
the  visible  spectrum  light,  and  have  a  rectangular  tessellation.  The  output  of 
such  sensors  is  a  2D}  grayscale  (or  color)  intensity  image. 

1.1.2  Feature  Extraction 

When  presented  with  an  intensity  image,  an  object  recognition  system’s  task  is 
to  identify  and  locate  the  object(s)  present  in  the  scene  that  generated  the  image. 
The  first  step  toward  this  goal  is  the  reduction  of  the  amount  of  the  input  data. 

Given  that  the  object,  or  objects,  of  interest  occupy  only  a  small  portion  of 
the  viewed  scene,  most  of  the  data  present  in  the  sensor’s  output  is  extraneous 
and  not  relevant  to  the  task  at  hand.  Feature  extraction  attempts  to  identify 
interesting  pieces  of  the  input  signal. 

It  is  not  easy  to  define  what  consists  a  feature.  The  definition  of  a  feature 
is  directly  related  to  the  model  representation,  i.e.  the  way  each  of  the  recogniz¬ 
able  by  the  system  objects  is  represented  and  stored  in  the  computer.  Such  a 
representation  is  in  turn  related  to  the  way  a  model  can  appear  in  the  context 
of  the  sensor  data  [11],  and  is  application  specific.  Traditionally,  the  following 
image  characteristics  have  been  used  as  features:  linear  and  curvilinear  segments, 
curvature  extrema,  curvature  discontinuities,  conics  etc. 
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Due  to  the  importance  of  this  stage’s  output,  the  problem  of  feature  extraction 
has  received  a  great  lot  of  attention  over  the  years;  in  particular,  the  topic  of 
edge  detection  (i.e.  extraction  of  linear  and  curvilinear  segments  from  gray-level 
intensity  data)  has  attracted  the  attention  of  a  large  number  of  researchers.  This 
interest  was  the  result  of  experimental  evidence  attesting  to  the  importance  of 
boundaries  for  the  human  visual  system  [2].  A  large  number  of  techniques  and 
algorithms  were  developed  for  feature  extraction.  However,  a  presentation  of 
these  techniques  escapes  the  scope  of  this  dissertation,  and  the  reader  is  referred 
to  one  of  the  relevant  textbooks  such  as  [7,43,63]. 

Various  features  can  be  combined  together  to  generate  object  descriptions. 
The  way  the  various  features  are  combined  depends  on  the  application  and/or 
the  method.  In  a  number  of  approaches,  geometric  information  is  derived  from 
the  extracted  features:  for  example,  the  position  of  a  certain  feature,  the  cur¬ 
vature  of  a  constant-curvature  curvilinear  segment,  the  eccentricity  of  a  conic, 
etc.  This  information  is  used  in  the  representation  of  the  corresponding  object. 
Also,  relational  information,  e.g.  relative  distance  or  relative  orientation  between 
features,  has  proven  useful  in  object  recognition. 

Before  we  conclude  this  section,  we  mention  an  important  issue  pertaining 
to  feature  extraction,  that  of  sensor  noise.  Indeed,  the  sensing  devices  are  not 
perfect,  but  instead  introduce  “measurement”  and  “amplification  stage”  noise. 
This  noise  manifests  itself  as  small  random  perturbations  of  the  values  of  the 
sampled  modality  and  can  potentially  cause  problems  during  the  feature  extrac¬ 
tion  process.  Consequently,  a  preprocessing  or  “filtering”  stage  typically  precedes 
the  feature-extraction  stage;  the  goal  of  the  filtering  stage  is  the  reduction  of  the 
random  perturbations  introduced  during  the  sampling. 
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1.1.3  Matching 


Once  the  set  of  models  that  are  to  be  recognizable  by  the  system  has  been  chosen, 
the  form  of  representation  of  the  models  must  be  fixed.  As  already  noted,  the 
representation  of  the  models  dictates  the  feature  types  that  the  feature-extraction 
stage  will  attempt  to  detect  in  the  sensory  output. 

Once  the  types  of  features  have  been  determined,  a  database  is  built  containing 
information  about  those  features  that  can  be  identified  in  objects  from  the  set  of 
recognizable  models.  The  task  of  the  matching  stage  is  to  identify  the  model,  or 
models,  whose  features  approximately  match  a  (sub-)set  of  the  features  generated 
by  the  feature-extraction  module. 

The  matching  stage  is  the  most  crucial  component  of  an  object  recognition 
system.  A  number  of  techniques  have  been  developed  toward  this  end.  But  in 
all  cases  there  is  a  trade-off  between  the  reliability  and  the  computation  cost: 
techniques  that  produce  very  reliable  results  are  computationally  heavy,  and  vice 
versa.  Most  matching  algorithms  have  been  based  on  cross-correlation  techniques, 
tree-  or  graph-search,  clustering,  or  indexing:  the  technique  generally  depends  on 
the  feature  type. 

The  matching  technique  should  allow  for  partial  occlusion,  rotation,  transla¬ 
tion,  and  scale  changes,  as  well  as  for  small  amounts  of  data  perturbation.  The 
output  of  the  matching  stage  is  a  set  of  hypotheses  regarding  the  identity  of  the 
models  that  are  embedded  in  the  scene.  Sometimes,  a  measure  of  belief  can  be 
associated  at  this  stage  with  each  of  the  models  in  the  set;  this  measure  allows 
the  hypotheses’  relative  ranking.  Together  with  the  set  of  models,  the  matching 
stage  also  recovers  the  transformation  that  the  corresponding  model  is  assumed 
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to  have  undergone. 

Several  matching  techniques  can  be  viewed  as  “filters”  or  “sieves”  that  con¬ 
siderably  reduce  the  number  of  candidate  hypotheses  as  to  the  identity  of  the 
object(s)  in  the  scene  and  the  transformation  the  objects  have  undergone.  In  this 
dissertation,  we  will  concentrate  on  the  geometric  hashing  approach  to  match¬ 
ing  [45,59,61,62],  This  approach  uses  geometric  invariants  to  represent  the  various 
models.  Geometric  invariants  are  also  used  to  index  into  the  database  of  models 
(see  also  section  2.2.1). 

1.1.4  Verification 

The  output  of  the  matching  module  is  a  set  of  hypotheses  regarding  the  identity  of 
the  object  or  objects  embedded  in  the  scene.  These  hypotheses  are  subsequently 
piped  into  the  verification  stage  whose  purpose  is  to  evaluate  the  quality  of  the 
hypotheses  and  either  accept  or  reject  them. 

In  order  to  evaluate  the  quality  of  the  hypotheses  that  are  generated,  the 
vision  system  projects  the  candidate  models  onto  the  scene  and  the  fraction  of 
the  model  accounted  for  by  the  available  input  data  is  computed.  “Optimal” 
cutoff  values  (thresholds)  are  typically  pre-determined  either  empirically  or  in 
an  ad  hoc  fashion.  These  thresholds  can  be  either  model-dependent  or  constant 
across  the  various  models,  and  their  use  allows  the  verification  module  to  decide 
whether  to  accept  or  reject  certain  hypotheses. 

The  majority  of  vision  systems  use  empirically-determined  thresholds  very  suc¬ 
cessfully.  In  some  cases,  and  under  certain  simplifying  assumptions,  a  theoretical 
analysis  makes  possible  the  computation  of  threshold  values  that  are  a  function 
of  the  scene  and  complexity  of  the  model  [37].  However,  to  our  knowledge,  no 
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current  operational  system  makes  use  of  theoretically-determined  thresholds. 


1.2  The  Scope  of  this  Dissertation 

In  this  dissertation,  we  concentrate  on  four  different  issues: 

•  Exploitation  of  parallelism  for  performing  object  recognition; 

•  Analysis  of  the  expected  probability  distributions  of  invariants  over  the 
hash  space  for  a  number  of  transformation  and  model  feature  distribution 
combinations; 

•  Analysis  of  how  sensor  noise  propagates  through  the  stage  of  invariant  com¬ 
putations;  and 

•  Formulation  of  a  Bayesian  interpretation  of  model  matching  with  geometric 
hashing. 

1.2.1  Exploitation  of  Parallelism 

In  chapter  3,  we  present  two  data-parallel  algorithms  for  performing  geometric 
hashing. 

•  The  first  algorithm  is  designed  for  an  SIMD  hypercube-based  parallel  archi¬ 
tecture;  the  algorithm  has  a  “connectionist”  flavor  with  information  flowing  via 
communication  patterns. 

•  The  second  algorithm  is  more  general,  based  on  data  broadcast  capabilities, 
and  suitable  for  any  type  of  parallel  architecture. 

In  addition  to  these  two  algorithms,  we  also  present  a  novel  radix-sort  algo¬ 
rithm  for  SIMD  hypercube-based  architectures. 
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1.2.2  Distributions  of  Invariants 


In  chapter  4,  we  derive  precise  as  well  as  approximate  formulas  and  qualitative 
results  for  the  statistical  distribution  of  geometric  invariants.  The  results  are 
derived  for  a  number  of  transformation  (rigid,  similarity,  affine)  and  feature- 
distribution  (uniform,  Gaussian)  combinations. 

The  analysis  corroborates  that  the  non-uniform  distribution  of  invariants  over 
hash  space  is  endemic  to  all  indexing-based  approaches  to  model  based  object 
recognition.  In  chapter  5,  we  show  how  one  can  use  the  knowledge  of  the  index 
distributions  to  develop  techniques  that  result  in  much  faster  implementations  of 
indexing-based  object  recognition  methods. 

1.2.3  Modeling  of  Noise 

In  chapter  6,  we  study  the  behavior  of  geometric  hashing  techniques  in  the  pres¬ 
ence  of  noise. 

We  show  that,  under  the  assumption  that  the  noise  introduced  by  the  sensor 
and  the  feature-extraction  module  can  be  modeled  as  a  Gaussian  random  process, 
the  computed  indices  follow  a  Gaussian  distribution  to  a  first  order  approximation. 

We  also  perform  the  noise  analysis  for  the  similarity  and  affine  transforma¬ 
tions,  and  show  that  the  effect  of  noise  in  the  latter  case  is  more  pronounced. 

1.2.4  Bayesian  Interpretation 

In  chapter  7,  we  present  an  interpretation  of  geometric  hashing  which  shows  that 
the  algorithm  can  be  viewed  as  a  Bayesian,  maximum-likelihood  object  recog¬ 
nition  method;  the  hypotheses  span  the  discrete  collection  of  models  and  the 
discrete  pairings  of  image  features  to  model  features. 


We  make  use  of  the  results  from  chapters  4  and  6  to  show  how  an  adap¬ 
tive  weighted- voting  scheme  can  be  used  to  accumulate  evidence  for  model/basis 
tuples  in  the  geometric  hashing  framework. 

The  validity  of  our  theory  is  demonstrated  in  chapter  8  where  we  describe  a 
complete  object  recognition  system  that  makes  use  of  the  ideas.  The  system  is 
implemented  on  a  SIMD  hypercube-based  parallel  machine  (a  Thinking  Machines 
Corporation  CM- 2)  and  can  recognize  objects  that  have  undergone  a  similarity 
transformation,  from  a  library  containing  the  models  of  14  aircraft  and  18  produc¬ 
tion  automobiles.  The  system  is  tested  using  real-world  imagery,  works  rapidly, 
and  exhibits  exceptional  performance. 
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Chapter  2 


An  Introduction  to  Model-based 
Object  Recognition 


Most  of  the  successful  object  recognition  systems  have  been  model-based.  In 
these  systems,  the  search  is  confined  to  a  finite  set  of  observable  models.  A  priori 
information  about  these  recognizable  models  is  maintained  in  an  appropriately 
structured  database.  The  type  of  information  contained  in  the  database  depends 
on  the  scheme  by  which  models  are  represented,  and  also  on  the  type  of  features 
(e.g.  points,  edges,  conics,  etc.).  Apart  from  having  given  rise  to  a  number  of 
successful  systems,  the  model-based  vision  paradigm  offers  the  possibility  of  a 
well-defined  analyzable  formulation. 

During  the  matching  stage,  the  task  of  a  model-based  vision  system  is  to 
determine  the  model  identity  (equivalently:  a  set  of  model  features),  and  a  trans¬ 
formation  for  each  model  that  is  present  in  the  scene.  The  transformation  brings 
the  set  of  model  features  in  correspondence  with  a  (possibly  proper)  subset  of 
image  features.  During  the  search  for  the  model  identity  and  the  appropriate 
transformation,  the  system  has  access  to  the  information  that  is  stored  in  the 
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system’s  database.  Things  may  be  complicated  if  the  objects  that  are  contained 
in  the  input  image  are  partially  occluded,  and  the  attributes  of  the  extracted 
features  (e.g.  position,  orientation,  etc.)  are  corrupted  by  noise. 

In  general,  object  recognition  involves  some  type  of  search.  The  search  can 
take  place  over  the  set  of  extracted  features  and  recognizable  models:  in  this  case, 
the  system  attempts  to  determine  correspondences  between  (sub-)sets  of  model 
and  image  features;  these  correspondences  are  consistent  with  the  permissible 
transformations.  Since  the  number  of  all  possible  “pairings”  between  model  and 
image  feature  sets  is  exponential  in  the  cardinality  of  the  feature  sets,  straightfor¬ 
ward  implementations  result  in  an  unfavorable  time  complexity.  In  an  attempt 
to  efficiently  prune  the  tree  search,  a  number  of  systems  have  made  use  of  local 
constraints  with  considerable  success. 

Alternatively,  the  search  can  take  place  in  the  space  of  transformations:  in  this 
case,  the  system  attempts  to  determine  a  transformation  that  brings  (sub-)sets 
of  model  and  image  features  in  correspondence. 

These  two  approaches,  namely  the  search  over  the  space  of  extracted  features 
and  the  search  over  the  space  of  transformations,  represent  the  main  paradigms 
that  have  dominated  the  field  of  object  recognition  in  the  last  ten  years. 

2.1  A  Survey  of  the  Object  Recognition  Field 

In  this  section,  we  briefly  describe  some  of  the  most  representative  object  recog¬ 
nition  systems  that  have  been  developed  during  the  last  two  decades. 

One  of  the  earliest  object  recognition  systems  was  developed  by  Roberts  [82], 
The  system  was  able  to  recognize  convex  polyhedral  objects  under  the  weak  per¬ 
spective  transformation.  It  controlled  and  pruned  the  search  by  considering  only 


11 


vertices  that  were  connected  by  an  edge,  and  thus  could  not  handle  occlusion. 
Unlike  Roberts’  system,  the  models  in  ACRONYM  [19]  were  generalized  cylin¬ 
ders.  ACRONYM  used  symbolic  constraints  to  control  and  effectively  prune  the 
search,  and  could  handle  both  noise  and  occlusion. 

A  related  approach  to  that  of  ACRONYM’S  was  taken  in  Goad’s  system  [35]. 
The  system  used  quantitative  (as  opposed  to  symbolic)  constraints  to  control 
the  search.  Goad’s  system  also  introduced  the  notion  of  the  two  stage  (off-line 
stage,  on-line  stage)  recognition  algorithm,  where  data  precomputed  during  a  first 
phase  (off-line)  are  used  during  the  phase  of  actual  recognition  (on-line)  in  order 
to  speed  up  the  processing. 

The  use  of  geometric  constraints  (such  as  distance  and  angle)  as  an  efficient 
way  for  pruning  the  search  while  matching  image  and  model  features,  was  advo¬ 
cated  by  Bolles  in  his  LFF  and  3DP0  systems  [15,16];  LFF  is  used  to  recognize 
2D  objects  from  intensity  images,  whereas  3DP0  is  used  for  recognition  of  3 D 
objects  from  range  data.  Geometric  constraints  are  also  used  in  the  RAF  sys¬ 
tem  of  Grimson  [38,39].  In  Grimson’s  system,  the  search  is  structured  around  an 
interpretation  tree,  and  exhibits  exponential  time  complexity  if  the  input  image 
(intensity  data)  includes  spurious  data.  More  recently,  the  BONSAI  system  [31] 
exploits  unary  and  binary  constraints  to  control  the  search  of  the  interpretation 
tree,  and  prune  the  search  space:  the  input  to  BONSAI  comprises  range  images 
of  parts  that  have  been  designed  using  a  CAD  tool. 

The  HYPER  system  of  Ayache  and  Faugeras  [3]  also  belongs  to  the  category 
of  systems  that  attempt  to  determine  correspondences  between  sets  of  model  and 
image  features.  HYPER  is  used  to  recognize  2D  objects  from  intensity  images. 
However,  its  success  is  dependent  on  the  quality  of  the  polygonal  approximations 
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of  the  input  image’s  contours.  Furthermore,  it  is  sensitive  to  noise  and  does  not 
deal  with  the  occlusion  of  edges. 

Lowe’s  system,  SCERPO  [68],  is  a  complete  object  recognition  system  that 
recognizes  polyhedral  3 D  objects  from  intensity  images,  under  the  perspective 
transformation.  SCERPO  attempts  to  reduce  the  complexity  of  the  search  by 
performing  perceptual  groupings  of  image  features.  However,  it  typically  deals 
with  only  one  or  two  models  in  the  model  database  at  any  given  time. 

More  recently,  work  by  Kak  [53]  shows  that  efficient  algorithms  can  prove 
beneficial  in  reducing  the  complexity  in  the  case  of  systems  that  search  over  the 
sets  of  features.  In  particular,  Kak  uses  bipartite  matching  in  conjunction  with 
the  notion  of  discrete  relaxation  to  perform  recognition  of  3 D  objects  using  the 
output  of  a  structured-light  scanner. 

In  addition,  a  number  of  other  systems  have  been  developed  that  search  the 
space  of  allowed  transformations.  The  classic  representative  of  this  approach  is 
the  generalized  Hough  transform  [5,8,91]:  the  method  is  a  generalization  of  the 
Hough  transform  [44]  and  is  used  to  detect  arbitrary  shapes.  In  the  generalized 
Hough  transform  framework,  the  recognition  of  objects  is  achieved  by  recovering 
the  transformation  that  brings  a  large  number  of  model  features  in  correspon¬ 
dence  with  image  features.  The  transformation  is  described  in  terms  of  a  set  of 
transformation  parameters,  and  votes  for  these  parameters  are  accumulated  by 
hypothesizing  matchings  between  subsets  of  model  and  image  features.  The  gen¬ 
eralized  Hough  transform  requires  the  quantization  of  a  range  of  values  for  each 
of  the  parameters,  thus  resulting  in  decreased  accuracy.  The  space  requirements 
are  exponential  in  the  number  of  the  parameters. 

The  system  by  Mundy  and  Thompson  [69,70]  uses  large  Hough  tables  to  per- 
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form  recognition  of  3 D  objects  from  2D  input  data,  under  the  weak  perspective 
transformation.  To  constrain  the  space  of  possible  transformations,  the  system 
uses  the  notion  of  the  “vertex-pair.”  A  vertex  pair  consists  of  two  vertices  and 
the  two  edges  forming  one  of  the  vertices.  An  improved  version  of  Mundy  and 
Thompson’s  system  [85]  remedies  the  problem  of  fixed  parameter  quantization 
by  iteratively  refining  the  quantization  around  volumes  of  interest  (histogram 
peaks),  until  the  required  precision  was  achieved.  A  similar  system  is  the  one 
of  Linnainmaa  [64]  which  introduced  the  notion  of  the  “triangle-pair;”  triplets  of 
vertices  from  the  image  are  matched  against  triplets  of  model  vertices  in  order  to 
hypothesize  a  transformation  under  the  perspective  projection  model;  however, 
the  system  provides  multiple  alternatives  that  require  examination. 

The  approaches  of  the  last  three  systems  can  be  considered  as  special  cases  of 
a  more  general  scheme  called  11  alignment"  [97].  In  alignment,  one  seeks  a  model 
from  the  model  database  together  with  a  transformation  from  the  allowed  class  of 
transformations  such  that  the  object  being  viewed  and  the  transformed  model  are 
in  correspondence;  for  those  transformations  where  the  number  of  corresponding 
features  exceeds  a  certain  threshold,  a  verification  procedure  is  invoked.  An¬ 
other  alignment-based  system,  RANSAC  [30],  is  used  to  recognize  objects  under 
perspective  transformation,  for  a  known  camera  position.  Huttenlocher’s  ORA 
system  [50]  on  the  other  hand,  performs  recognition  assuming  the  weak  perspec¬ 
tive  transformation  model.  More  recently,  alignment  ideas  have  been  combined 
with  efficient  string  matching  in  order  to  perform  unoccluded  polygonal  object 
recognition  [83]. 

The  approach  of  Ullman  and  Basri  [98]  is  considerably  different.  The  basic 
idea  here  is  that  each  topologically  different  model  view  can  be  expressed  as  a 
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linear  combination  of  a  small  number  of  2D  views  of  the  model.  The  method 
assumes  that  the  transformation  of  the  model  to  the  scene  can  be  modeled  by  an 
orthographic  projection,  and  can  handle  3 D  rigid  as  well  as  non-rigid  transforma¬ 
tions  of  the  models.  Scene  clutter  causes  considerable  problems.  The  scheme  has 
been  treated  mostly  theoretically.  Some  preliminary  results  indicate  reasonable 
performance  with  databases  containing  a  handful  of  objects. 

Another  general  scheme  that  also  involves  search  over  the  space  of  transfor¬ 
mations  is  the  geometric  hashing  scheme.  Although  based  on  the  same  geometric 
principles  as  alignment,  geometric  hashing  differs  from  alignment  in  the  algo¬ 
rithmic  approach.  Since  this  dissertation  revolves  around  the  geometric  hashing 
scheme,  we  chose  to  survey  the  object  recognition  systems  that  have  been  based 
on  geometric  hashing  ideas  in  a  separate  section  (section  2.3). 

All  of  the  systems  that  have  been  described  so  far,  as  well  as  those  based  on 
the  geometric  hashing  scheme,  typically  represent  the  database  models  using  a 
small  number  of  homogeneous,  local  features.  These  features  “define”  the  objects. 
Furthermore,  the  objects  under  consideration  are  treated  in  isolation  from  the  rest 
of  the  scene.  Unlike  these  systems,  CONDOR  [92]  is  the  first  system  that  takes 
the  approach  of  performing  context  recognition  first,  and  then  instantiates  the 
individual  components.  Natural  objects  such  as  sky,  ground ,  foliage  are  included 
in  the  system’s  vocabulary.  A  special-purpose  database  contains  all  the  necessary 
information  about  the  world.  The  introduction  of  context  results  in  increased 
flexibility  at  the  expense  of  a  major  increase  of  the  computational  complexity. 
The  input  to  the  system  can  be  any  combination  of  intensity,  range,  color  or  other 
data  modalities.  The  output  is  a  labeled  3 D  model  of  the  input  image,  with  the 
labels  referring  to  the  object  classes  that  can  be  recognized  by  the  system. 
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The  system  by  Swain  [93]  can  recognize  deformable  objects  and  substances 
described  by  mass  nouns  by  making  use  of  color  information.  Thus  it  is  similar  in 
flavor  to  CONDOR.  However,  its  use  of  precomputed  invariants  for  the  different 
database  models  in  a  two-stage  algorithm,  brings  the  system  closer  to  the  ones 
that  are  based  on  hashing/indexing  ideas  (see  section  2.3).  Notably,  Swain’s 
system  can  recognize  objects  independent  of  background  and  viewpoint  variations, 
occlusion,  scale,  and  lighting  conditions. 

Vayda  and  Kak’s  INGEN  system  [99]  performs  object  classification  based  on 
the  overall  shape  of  the  object.  For  certain  object  recognition  tasks  it  suffices  to 
categorize  the  objects  based  on  their  general  shape  (e.g.  parallelepiped,  cylinder, 
etc.)  and  independently  of  their  size.  Because  of  the  large  variations  in  size, 
feature-based  object  recognition  techniques  are  not  easily  applicable.  INGEN 
uses  a  hypothesize- and-verify  approach  to  determine  the  pose  and  generic  shape 
of  objects  from  range  data.  Hypotheses  generated  for  each  region  in  the  segmented 
range  data  can  be  combined  using  either  information  contained  in  a  combinability 
graph,  or  proximity  and  continuity  heuristics.  Use  of  the  combinability  graphs 
controls  the  combinatorial  explosion  by  efficient  pruning  of  the  search  space. 

Another  system  with  no  knowledge  of  a  geometric  or  structural  model  for  each 
of  the  database  objects  is  the  one  by  Stark  and  Bowyer  [88].  In  their  system, 
object  classes  are  described  in  terms  of  the  functional  properties  shared  by  all  the 
3 D  objects  in  the  class.  The  various  functional  properties  are  represented  using 
procedural  knowledge.  The  system  has  been  successfully  tested  with  a  database 
of  100  objects  belonging  to  the  “chair”  class;  the  output  of  a  CAD  tool  was  used 
to  provide  the  test  input  to  the  system. 

Dickinson  [26]  presents  yet  another  approach  to  3 D  object  recognition  from 


16 


intensity  images.  His  system  uses  a  small  set  of  volumetric  primitives  which  can  be 
assembled  to  form  the  objects  that  can  be  recognized.  An  important  component 
to  the  system  is  a  hierarchy  of  2D  features  (such  as  contours,  faces,  groups  of 
faces)  that  are  generated  by  projecting  the  primitives  based  on  a  set  of  viewer- 
centered  orientations;  conditional  probabilities  capture  the  relation  between  nodes 
at  different  levels  of  the  hierarchy,  and  can  be  computed  off-line.  The  number  of 
orientations  is  fixed  and  thus  independent  of  the  number  of  models  the  system 
can  recognize.  During  recognition,  the  system  uses  a  bottom-up  approach  and 
precomputed  conditional  probabilities  to  extract  primitives  in  the  input  image,  as 
well  as  the  connectivities  of  the  primitives.  Using  the  primitive  and  connectivity 
information,  the  system  indexes  into  the  model  database  to  recover  the  identity 
of  the  viewed  object.  Bergevin’s  PARVO  system  [10]  takes  a  similar  approach  to 
that  of  Dickinson’s  but  makes  use  of  “geons”  [12]  as  the  modeling  primitives. 

Kriegman  and  Ponce’s  approach  [57]  also  exploits  the  relation  between  the 
shape  of  intensity  image  contours  and  the  models  of  3 D  objects  of  revolution. 
Under  the  assumption  that  the  image  contours  are  the  projections  of  either  surface 
discontinuities  or  occluding  contours,  Kriegman  and  Ponce  use  elimination  theory 
to  construct  the  implicit  equations  of  the  contours  under  perspective  and  weak 
perspective  projections;  the  equations  are  parameterized  by  the  object’s  position 
and  orientation.  In  the  current  system,  edge  segments  are  grouped  into  contours 
manually,  and  no  extraneous  data  are  fed  into  the  algorithm.  Also,  contours  that 
are  neither  surface  nor  occluding  discontinuities  are  manually  removed.  Although 
they  obtain  good  results,  the  success  of  the  approach  relies  heavily  on  the  quality 
of  segmentation. 
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2.2  Indexing  Methods 


The  geometric  hashing/indexing  methods  represent  an  alternative,  efficient  and 
highly  parallelizable  approach  to  performing  pattern  matching.  These  methods 
borrow  from  the  technique  of  “hashing,”  a  fundamental  technique  in  the  field  of 
computer  science  [1,56]. 

The  basic  idea  behind  hashing  is  the  partitioning  of  a  possibly  infinite  data 
set,  D,  into  a  finite  set  of  groups  Gq  i  =  1,2,3, The  G;’  s  form  the  hash 
table  data  structure.  A  hash  function,  h,  whose  domain  is  the  set  from  where 
the  members  of  T>  obtain  values,  and  range  the  set  {1,2,3,  .  .  .  ,(/},  provides  the 
partitioning  by  assigning  a  data  item  x  to  the  group  Gh(x)-  The  data  item  x  is 
then  said  to  belong  to  the  group  G/q^),  and  the  group’s  index  h(x)  is  called  the 
hash  value  of  the  data  item  x.  Clearly,  there  is  no  unique  choice  for  the  hash 
function  h(-).  This  particular  form  of  hashing  is  known  as  open  hashing /closed 
addressing ,  and  is  one  of  the  two  classic  forms  of  hashing  [1], 

The  importance  of  the  hashing  technique  comes  from  the  fact  that,  by  ap¬ 
propriately  selecting  the  hash  function,  dictionary  operations  such  as  INSERT, 
DELETE  and  MEMBER  (see  [1])  can  be  carried  out  in  0(1)  time  on  the  average. 

In  geometric  hashing ,  the  collection  of  models  to  be  stored  in  the  database 
is  used  during  a  preprocessing  phase  (thus  executed  “off-line”)  in  order  to  build 
the  hash  table  data  structure.  The  hash  function  is  selected  in  such  a  way  that 
the  resulting  hash  table  structure  encodes  geometric  information  pertaining  to 
small  subsets  of  model  features.  This  geometric  information  is  encoded  in  a 
highly-redundant  way.  During  the  recognition  phase,  when  presented  with  a 
scene  from  which  features  are  extracted,  the  hash  table  data  structure  is  used  to 
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index  geometric  properties  of  the  scene  features  to  candidate  matching  models. 
A  search  over  the  scene  features  is  still  required.  However,  the  geometric  hashing 
scheme  obviates  a  search  over  the  models  and  the  model  features  as  is  the  case 
with  the  interpretation  tree  and  alignment  methods.  In  the  following  section,  we 
examine  in  more  detail  the  geometric  hashing  idea. 

2.2.1  Geometric  Hashing  for  Model  Matching 


Features  such  as  points,  linear  and  curvilinear  segments,  corners,  etc.  are  ex¬ 
tracted  during  the  feature  extraction  stage  (see  section  1.1.2).  Any  such  collection 
of  features  can  be  represented  by  a  set  of  dots:  each  dot  represents  the  feature’s 
location;  associated  with  each  dot  is  a  list  of  one  or  more  attributes  (the  feature’s 
attribute  list)  which  depends  on  the  corresponding  feature’s  type. 


3  ° 
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Figure  2.1  Model  Mi  consisting  of  five  points. 


We  can  thus  confine  ourselves,  without  any  loss  of  generality,  to  the  problem 
of  recognition  of  clusters  of  point  features  (dot  patterns),  with  the  understanding 
that  each  such  point  may  have  associated  with  it  an  attribute  list.  In  the  simplest 
case  which  is  examined  below,  one  is  interested  only  in  the  positional  information 
of  the  features,  and  the  attribute  list  is  empty.  In  a  more  general  vision  system, 
one  wishes  to  recognize  patterns  of  lines,  corners,  and  other  features,  attached  to 


19 


3 D  objects  undergoing  rigid  3 D  transformations  and  perspectively  projected  onto 
the  image  plane.  The  geometric  hashing  algorithms  extend  to  that  case  as  well, 
at  the  expense  of  more  complicated  transformation  classes  and  implementation 
issues. 


Model 


o  2 


Figure  2.2  Determining  the  hash  table  entries  when  points  4  and  1 
are  used  to  define  a  basis.  The  models  are  allowed  to  undergo  rotation, 
translation  and  scaling. 


Suppose  that  we  wish  to  perform  recognition  of  patterns  of  point  features 
that  may  be  translated,  rotated  and  scaled  (similarity  transformations).  Two 
points  are  needed  to  define  a  basis.  Figure  2.1  shows  a  model  (Mi)  consisting 
of  five  dots  with  position  vectors  Pi,P2,P3,P4  and  p.5  respectively.  We  begin  by 
scaling  the  model  Mi  so  that  the  magnitude  of  P4P1  in  the  Oxy  system  is  equal 
to  1.  Suppose  now  that  we  place  the  midpoint  between  dots  “4”  and  “1”  at  the 
origin  of  a  coordinate  system  Oxy  in  such  a  way  that  the  vector  P4P1  has  the 
direction  of  the  positive  .r-axis.  The  remaining  three  points  of  Mi  will  land  in 
three  locations.  Let  us  record  in  a  quantized  hash  table,  in  each  of  the  three  bins 
where  the  remaining  points  land,  the  fact  that  model  Mi  with  basis  “(4, 1)”  yields 
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an  entry  in  this  bin.  This  is  shown  graphically  in  Figure  2.2. 


Figure  2.3  The  locations  of  the  hash  table  entries  for  model  Mi. 
Each  entry  is  labeled  with  the  information  “model  Mi”  and  the  basis 
pair  (i,j)  that  was  used  to  generate  the  entry.  The  models  are  allowed 
to  undergo  rotation,  translation  and  scaling. 


Similarly,  the  hash  table  contains  three  entries  of  the  form  (Mi,  (4,2)),  three 
entries  of  the  form  (Mi,  (4,3)),  etc.  Each  triplet  of  entries  is  generated  by  first 
scaling  the  model  Mi  so  that  the  corresponding  basis  has  unit  length  in  the  Oxy 
coordinate  system,  and  then  by  placing  the  midpoint  of  the  basis  at  the  origin 
of  the  hash  table  in  such  a  way  that  the  basis  vector  has  the  direction  of  the 


21 


positive  ,r-axis.  The  same  process  is  repeated  for  each  ordered  basis,  and  each  of 
the  models  in  the  database.  Of  course,  some  hash  table  bins  may  receive  more 
than  one  entry.  As  a  result,  the  final  hash  table  data  structure  will  contain  a  list 
of  entries  of  the  form  (model,  basis)  in  each  hash  table  bin.  Figure  2.3  shows  the 
locations  of  all  the  hash  table  entries  for  model  M\. 

In  the  recognition  phase,  a  pair  of  point  >.  (pA(1,  p/(v ),  from  the  image' is  chosen 
as  a  candidate  basis.  This  ordered  basis  defines  a  coordinate  system  Oxy  whose 
center  coincides  with  the  midpoint  of  the  pair;  the  direction  of  the  basis  vector 
Pa<9  —  Pw  coincides  with  that  of  the  positive  .r-axis.  The  magnitude  of  the  basis 
vector  defines  the  “unit”  length  for  Oxy. 


Image 


cast  1  vote 
for  each  entry 
in  bin’s  list 


in  the  end  histogram 
all  entries  with  one 
or  more  votes 


Figure  2.4  Determining  the  hash  table  bins  that  are  to  be  notified 
when  two  arbitrary  image  points  are  selected  as  a  basis.  The  allowed 
transformation  is  similarity. 


The  coordinates  of  all  other  points  are  then  calculated  in  the  coordinate  system 
defined  by  the  chosen  basis.  Each  of  the  remaining  image  points  is  mapped  to 
the  hash  table,  and  all  entries  in  the  corresponding  hash  table  bin  receive  a 
vote.  Figure  2.4  shows  this  graphically.  If  there  are  sufficient  votes  for  one  or 
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more  ( model ,  basis )  combinations,  then  a  subsequent  stage  attempts  to  verify  the 
presence  of  a  model  with  the  designated  basis  matching  the  chosen  basis  point. 
In  the  case  where  model  points  are  missing  from  the  image  because  they  are 
obscured,  recognition  is  still  possible,  as  long  as  there  is  a  sufficient  number  of 
points  hashing  to  the  correct  hash  table  bins.  The  list  of  entries  in  each  hash 
table  bin  may  be  large,  but  because  there  are  many  possible  models  and  basis 
sets,  the  likelihood  that  a  single  model  and  single  basis  set  will  receive  multiple 
votes  is  quite  small,  unless  a  configuration  of  transformed  points  coincides  with  a 
model.  In  general,  we  do  not  expect  the  voting  scheme  to  give  only  one  candidate 
solution  (see  [60]).  The  goal  of  the  voting  scheme  is  to  act  as  a  sieve  and  reduce 
significantly  the  number  of  candidate  hypotheses  for  the  verification  step. 

For  the  algorithm  to  be  successful  it  is  sufficient  to  select  as  a  basis  tuple  any 
set  of  image  points  belonging  to  some  model.  It  is  not  necessary  to  hypothesize  a 
correspondence  between  specific  model  points,  and  specific  scene  points,  since  all 
models  and  basis  pairs  are  redundantly  stored  within  the  hash  table.  Classification 
or  perceptual  grouping  of  features  can  be  used  to  make  the  search  over  scene 
features  more  efficient,  for  example,  by  making  use  of  only  special  basis  tuples. 

2. 2. 1.1  The  Steps  Behind  the  Idea 

We  described  above  how  one  conceptually  determines  the  index  of  a  hash  table 
bin  during  the  two  phases  of  geometric  hashing  for  the  special  case  of  similarity 
transformations.  We  next  examine  in  greater  detail  the  actual  algorithm,  and  also 
give  the  hash  functions  for  the  set  of  transformations  that  we  will  be  considering 
in  this  dissertation.  Note  that  since  both  the  model  and  the  input  data  come 
from  a  2D  grayscale  image,  the  patterns  of  point  features  will  be  planar. 
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Case  of  Rigid  Transformations.  Assume  that  the  patterns  of  point  features 
corresponding  to  the  different  models  can  undergo  only  rigid  transformations, 
i.e.,  rotation  and  translation.  A  rigid  transformation  of  any  such  pattern  can  be 
uniquely  defined  by  the  transformation  of  two  points. 


Assume  that  we  are  given  a  set  of  n  point  features  belonging  to  one  of  the 
models  of  our  database,  e.g.  Mi,  and  let  p/(  and  p/(9  be  an  ordered  pair  of 
points  from  that  set.  Then  the  vectors  p^.  =  (pA(9  —  pA(1)/  ||  (pJ  —  pA(1 )  ||  and 
pr  =  Rotgolp^.)1  form  an  orthonormal  basis,  and  thus  a  coordinate  system  Oxy 
(see  Figure  2.5).  Any  point  p  in  the  plane  can  be  represented  in  this  basis,  namely, 
there  is  a  unique  pair  of  scalars  (u,v),  such  that 

P  P„  "P.  '  P  (2.1) 

where  Pq  =  Pmi +Pm2  is  the  midpoint  between  pA(1  and  p^, .  In  other  words,  the 

center  of  the  coordinate  system  Oxy  coincides  with  pg. 

The  parameters  u  and  v  are  the  coordinates  of  the  point  p  in  the  coordinate 

system  defined  by  the  basis  pair.  If  we  apply  a  rigid  transformation  T  to  all  of 
1I.e.  P  is  the  vector  obtained  by  rotating  p')  counter-clockwise  by  90  degrees. 
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the  n  points,  then  with  respect  to  the  new  basis  (Tpj.,Tp((),  the  transformed 
coordinates  of  Tp  will  again  be  (u}v).  That  is,  (u,v)  is  invariant  with  respect 
to  transformation  T,  providing  the  corresponding  points  are  chosen  as  the  basis 
pair,  and  providing  the  corresponding  point  is  represented  in  that  basis. 

We  will  represent  the  object  points  by  their  coordinates  in  all  possible  basis 
tuples.  This  is  because  for  any  given  tuple,  it  is  possible  that  one  of  the  points  will 
be  obscured,  and  thus  the  specified  tuple  may  not  be  present.  More  specifically, 
the  following  preprocessing  is  carried  out: 

Preprocessing  Phase 

For  each  model  m  do: 

(1)  Extract  the  model’s  point  features.  Assume  that  n  such  features  are  found. 

(2)  For  each  ordered  pair  (p|Ul,p|U2)  of  point  features  do: 

(a)  Compute  the  coordinates  ( u ,  v)  of  the  remaining  features  in  the  coordinate 
frame  defined  by  the  basis  (p|Ul,p|U2). 

(b)  After  a  proper  quantization,  use  the  tuple  ( u ,  v)  as  an  index  to  a  two- 
dimensional  hash  table  data  structure,  and  insert  in  the  corresponding  hash 
table  bin  the  information  (to,  (p^ ,  p^2)),  namely,  the  model  number  and  the 
basis  tuple  which  was  used  to  determine  (u,v). 


If  M  models  are  to  be  inserted  in  the  database,  it  is  clear  that  the  time 
complexity  of  this  step  is  0(Mn3). 

During  the  recognition  phase,  the  algorithm  uses  the  hash  table  data  structure 
that  was  prepared  during  the  preprocessing  phase.  This  phase  proceeds  as  follows: 
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Recognition  Phase 

When  presented  with  an  input  image, 

(1)  Extract  the  various  points  of  interest.  Assume  that  S  is  the  set  of  the  interest 
points  found;  let  S  be  the  cardinality  of  S. 

(2)  Choose  an  arbitrary  ordered  pair,  (p|Ul,p|U2),  of  interest  points  in  the  image. 

(3)  Compute  the  coordinates  of  the  remaining  interest  points  in  the  coordinate  system 
Oxy  that  the  selected  basis  defines. 

(4)  For  each  such  coordinate  check  the  appropriate  hash  table  bin,  and  for  every  entry 
found  there,  cast  a  vote  for  the  model  and  the  basis. 

(5)  Histogram  all  the  hash  table  entries  that  received  one  or  more  votes  during  step  (4). 
Proceed  to  determine  those  entries  that  received  more  than  a  certain  number 
(threshold)  of  votes:  each  such  entry  corresponds  to  a  potential  match. 

(6)  For  each  potential  match  discovered  in  step  (5),  consider  all  the  model-image  feature 
pairs  which  voted  for  the  particular  entry,  and  recover  the  transformation  T  that 
results  in  the  best  least-squares  match  between  all  these  corresponding  feature 
pairs.  Since  the  computation  of  this  transformation  is  based  on  more  than  two 
point  feature  pairs,  it  will  most  likely  be  more  accurate. 

(8)  Transform  the  edges  and  higher-order  features  of  the  model  according  to  the  re¬ 
covered  transformation  T  and  verify  them  against  the  input  image  features.  If 
the  verification  fails,  go  back  to  step  (2)  and  repeat  the  procedure  using  a  different 
image  basis  pair. 
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The  time  complexity  for  the  recognition  phase  is  O  (S2Mn3)}  in  the  worst 
case.  This  complexity  results  from  the  fact  that  all  0(S2)  possible  image  basis 
pairs  may  have  to  be  considered,  requiring  the  histograming  of  0(Mn3)  data 
items  each  time.  In  reality,  though,  a  handful  of  basis  selections  will  suffice,  and 
only  a  small  percentage  of  all  hash  table  entries  will  be  histogramed  for  each  such 
selection. 

Case  of  Similarity  Transformations.  Assume  now  that  the  patterns  of  point 
features  corresponding  to  the  different  models  can  undergo  similarity  transforma¬ 
tions,  i.e.  rotation,  translation,  and  scaling.  Again,  a  similarity  transformation 
of  any  such  pattern  can  be  uniquely  defined  by  the  transformation  of  two  points. 

The  analysis  is  the  same  as  in  the  case  of  the  rigid  transformation,  with  one 
simple  modification:  in  the  calculation  of  the  transformed  coordinates  of  a  point 
with  respect  to  a  basis  pair,  we  normalize  the  basis  pair  to  have  unit  length.  The 
orthonormal  basis  of  the  system  Oxy  (see  Figure  2.5)  defined  by  the  basis  pair 
will  now  be  p*  =  (p^2  —  pMl)  and  p*  =  Rot(p*)  respectively.  Any  point  p  in  the 
plane  can  be  represented  in  this  basis,  namely,  there  is  a  unique  pair  of  scalars 
(it,  v)  such  that 

P  -  Po  =  +  vp‘y  (2-2) 

where  pg  =  Pmi+Pm2  is  the  midpoint  between  pw  and  p^2.  In  other  words,  the 
center  of  the  coordinate  system  Oxy  coincides  with  pg. 

The  operations  during  the  preprocessing  and  recognition  phases  are  precisely 
the  same  as  described  in  the  case  of  rigid  transformations,  with  the  understanding 
that  Eqn.  2.2  is  now  used  to  determine  the  invariants  (it,  v).  The  time  complexities 
of  the  preprocessing  and  recognition  phases  remain  the  same  as  in  the  case  of  the 
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rigid  transformation. 


Case  of  Affine  Transformations.  The  casq  where  the  patterns  of  point  fea¬ 
tures  corresponding  to  the  different  models  can  undergo  a  general  linear  (affine) 
transformation  is  slightly  different.  An  affine  transformation  of  any  such  pattern 
can  be  uniquely  defined  by  the  transformation  of  three,  instead  of  two,  points. 


Assume  that  we  are  given  a  set  of  n  point  features  belonging  to  one  of  the 
models  of  our  database,  and  let  pA(1 ,  pA(9,  and  pA(3  be  an  ordered  triplet  of  points 
from  that  set.  Then  the  vectors  p“  =  (pA(9  —  pA(1 )  and  p“  =  (pA(3  —  pA(1 )  form  a 
skewed  basis,  and  thus  a  skewed  coordinate  system  Oxy  (see  Figure  2.6).  Any 
point  p  in  the  plane  can  be  represented  in  this  basis,  namely,  there  is  a  unique 
pair  of  scalars  (u,v),  such  that 

P  Pii  "P  '  I>  :  (2.3) 

where  p()  =  Pmi  +p'‘2  +Pf‘3  is  the  bary center  of  the  triangle  formed  by  pA(1 ,  pA(9  and 
p^j  .  In  other  words,  the  center  of  the  coordinate  system  Oxy  coincides  with  pg. 

The  parameters  u  and  v  are  the  coordinates  of  the  point  p  in  the  coordinate 
system  defined  by  the  skewed  basis.  If  we  apply  an  affine  transformation  T  to  all 
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of  the  n  points,  then  with  respect  to  the  new  basis  (Tp“,Tp“),  the  transformed 
coordinates  of  Tp  will  again  be  (u,v).  That  is,  (u,v)  is  invariant  with  respect 
to  the  transformation  T,  providing  the  corresponding  triplet  of  points  is  chosen 
to  define  the  skewed  coordinate  system,  and  providing  the  corresponding  point  is 
represented  in  that  basis. 

The  operations  during  the  preprocessing  and  recognition  phases  are  precisely 
the  same  as  described  in  the  case  of  rigid  transformations,  with  the  understanding 
that:  (i)  Eqn.  2.3  is  now  used  to  determine  the  invariants  (u,  v),  and  (ii)  the  basis 
tuple  is  a  triplet.  The  time  complexity  of  the  preprocessing  phase  for  the  case  of 
the  affine  transformation  is  now  0(Mn4)}  whereas  that  of  the  recognition  phase 
is  0(S3Mn4). 


Some  General  Observations.  Before  we  conclude  this  section,  we  make  sev¬ 
eral  observations.  The  first  concerns  the  selection  of  the  position  of  the  center  of 
the  coordinate  system  Oxy  which  is  defined  by  the  chosen  basis.  In  all  three  cases 
above,  we  made  a  seemingly  arbitrary  selection  for  that  position.  As  we  will  see 
in  section  4.3,  there  is  a  well-founded  justification  behind  our  choice. 


Name 

c 

Allowed  Transformation 

Preproc.  Compl. 

Recog.  Compl. 

Fixed 

1 

Translation 

O  (Mn2) 

O  (S2Mn2) 

Rigid 

2 

Rot  ation  /  Translation 

O  (Mn3) 

O  (S3Mn3) 

Similarity 

2 

Rot  ation  /  Translation  /  S  caling 

O  (Mn3) 

O  (S3Mn3) 

Affine 

3 

Linear  /  Similarity 

O  (Mn4) 

O  (S4Mn4) 

Table  2.1  The  time  complexities  for  the  preprocessing  and  recogni¬ 
tion  phases,  for  several  transformations. 


The  second  observation  concerns  the  time  complexities  of  the  preprocessing 
and  recognition  phases.  If  c  denotes  the  cardinality  of  the  basis  tuple,  then  the 
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time  complexity  of  the  preprocessing  phase  is  O  (Mnc+1),  whereas  the  complexity 
of  the  recognition  phase  is  O  (Sc+1  Mnc+1).  Table  2.1  summarizes  these  results. 

Also,  recall  from  the  previous  section  that  in  the  context  of  geometric  hashing, 
the  hash  functions  assume  values  from  the  set  R2.  In  general,  the  hash  function 
in  the  context  of  geometric  hashing  can  have  the  general  form 

h  :  Rk  ->  R', 

with  k,l  £  Z+.  The  hash  function  h  is  then  said  to  be  /-dimensional,  and  the 
range  of  h  will  be  referred  to  as  the  “hash  space,”  or  “space  of  invariants.” 

2.3  Geometric  Hashing  Systems 

We  conclude  this  chapter  with  a  historical  summary  of  the  development  of  geo¬ 
metric  hashing  methods  for  object  recognition.  This  discussion  is  mostly  limited 
to  treatments  that  led  directly  to  the  geometric  hashing  method  or  used  the  geo¬ 
metric  hashing  terminology.  It  is  clear  that  related  ideas  and  essentially  equivalent 
concepts  occurred  frequently  in  the  development  of  object  recognition  systems. 
For  example,  the  “feature  sphere”  used  in  Chen  and  Kak’s  3.D-P0LY  [23]  is  essen¬ 
tially  a  hash  function  that  permits  indexing  into  a  smaller  set  of  models.  Likewise, 
work  by  researchers  at  IBM  T.J.  Watson  Research  Center  has  been  based  on  ideas 
of  indexing  into  model  bases  for  many  years  [14,20,21], 

Geometric  hashing  shares  certain  philosophical  underpinings  with  Hough 
transform  methods,  and  was  in  part  motivated  by  related  works  [5,6].  The  differ¬ 
ence  is  that  Hough  transform  methods  search  over  the  space  of  transformations, 
whereas  hashing  uses  model/scene  matches  to  establish  interpretations.  The  idea 
of  geometric  hashing,  at  least  in  its  modern  incarnation,  has  its  origins  in  work  of 
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Professor  Jacob  Schwartz  [52],  The  first  efforts  were  concentrated  on  the  recog¬ 
nition  of  2D  objects  from  their  silhouettes.  Hence,  efficient  curve-matching  tech¬ 
niques  were  developed.  The  use  of  “footprints”  to  describe  properties  along  the 
curves  was  later  extended  by  Wolfson  and  Hong  [42]  and  resulted  in  a  recognition 
system  that  was  able  to  recognize  about  ten  2D  objects  partially  occluding  each 
other.  The  objects  were  taken  from  a  library  of  a  hundred  models,  and  recognition 
was  performed  allowing  planar  rigid  motion  (rotation  and  translation).  A  land¬ 
mark  in  the  application  of  curve  matching  and  combinatorial  optimization  meth¬ 
ods  was  their  use  to  assemble  (graphically  rather  than  physically)  all  the  pieces  of 
two  hundred-piece  commercial  jigsaw  puzzles,  from  separate  photographs  of  their 
individual  pieces  [101].  The  assembly  was  based  on  shape  information  only.  In  all 
two-dimensional  curve-matching  work,  footprints  were  used  to  limit  the  number 
of  candidate  curves  accessed  by  the  matching  system. 

However,  hash  functions  (still  called  footprint  information )  were  much  more 
essential  when  used  for  3 D  curve  matching  obtained  from  depth  data  of  objects. 
Using  depth  data  obtained  from  a  fast  but  approximate  depth  sensor  [22],  Wolf- 
son  and  Kishon  developed  a  practical  method  for  locating  and  matching  curves 
on  rigid  3 D  objects  [54],  and  extended  the  work  by  using  a  different  hashing  tech¬ 
nique  [84];  all  3 D  curve-matching  systems  used  measures  of  the  local  curvature 
as  index  values  into  a  table. 

Application  of  the  geometric  hashing  idea  as  an  approach  to  model-based 
vision  object  recognition  was  introduced  by  Lamdan,  Schwartz,  and  Wolfson. 
Much  of  the  work  is  summarized  in  the  dissertation  of  Lamdan  [58].  Efficient 
algorithms  were  developed  for  recognition  of  flat  rigid  objects  assuming  the  affine 
approximation  of  the  perspective  transformation  [61,62]  and  the  technique  was 
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also  extended  to  the  recognition  of  arbitrary  rigid  3 D  objects  from  single  2D 
images  [59].  An  integrated  discussion  of  both  the  object-matching  and  the  curve¬ 
matching  aspects  can  be  found  in  [45],  and  related  demonstrations  of  geometric 
hashing  applications  are  described  in  [61,62], 

Geometric  hashing  systems  have  since  been  built  and  explored  by  many  re¬ 
search  groups.  It  is  fair  to  say  that  most  implementations  of  geometric  hashing 
systems  seem  to  work  as  well  as  classical  model-based  vision  systems,  and  deliver 
in  terms  of  the  promise  of  greater  efficiency. 

Stein  and  Medioni  [89]  present  a  system  for  the  recognition  of  planar  objects 
from  intensity  images.  The  system  uses  a  hash  table  that  contained  the  gray- 
encodings  of  groups  of  consecutive  edge  segments  (“supersegments”),  of  varying 
cardinalities.  For  recognition  of  general  3 D  objects  from  single  2D  images,  promis¬ 
ing  results  have  been  obtained  in  the  dissertation  of  Lamdan  [58],  where  many 
viewpoint-centered  models  are  generated  of  simple  3 D  models,  and  the  work  of 
Gavrila  and  Groen  [34],  who  generate  viewpoint-centered  models  based  on  ex¬ 
periments  that  determine  limits  of  discriminability.  Stein  and  Medioni’s  TOSS 
system  [90]  uses  “structural  hashing”  to  recognize  3 D  shapes  from  dense  range 
data  from  which  characteristic  curves  and  local  differential  patches  are  extracted. 
The  method  allows  only  for  rigid  transformations  (rotation  and  translation),  and 
despite  the  use  of  high-dimensional  indices,  the  verification  stage  is  very  costly. 

Forsyth  et  al.  [33]  present  and  use  descriptors  based  on  pairs  of  planar  curves; 
the  descriptors  are  invariant  under  affine  and  perspective  transformations.  They 
obtain  good  results,  but  their  method  is  sensitive  to  occlusion  and  its  performance 
depends  strongly  on  the  quality  of  the  segmentation. 

For  the  recognition  of  rigid  3 D  curves  extracted  from  medical  imagery,  Gueziec 
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makes  use  of  hashing  methods  to  speed  matching  based  on  spline  curve  approxi¬ 
mations  [40].  For  recognition  of  3 D  objects  from  range  data,  Flynn  and  Jain  [32] 
use  hashing  and  local  feature  sets  to  generate  hypotheses  (without  a  voting  pro¬ 
cedure),  and  report  a  more  efficient  search  than  a  constrained  search  (hypothesize 
and  verify)  approach,  when  applied  to  two  dozen  models.  Representing  3 D  ob¬ 
jects  by  means  of  their  characteristic  view,  and  regarding  object  recognition  as  a 
graph  matching  problem,  Sossa  and  Horaud  [86]  develop  hash  functions  for  graphs 
to  provide  rapid  matching  capabilities. 

Parallel  implementations  of  geometric  hashing,  for  a  Connection  Machine, 
have  been  attempted  by  Medioni  [17].  A  parallel  implementation  developed  for 
this  dissertation  has  been  described  elsewhere  [78,79,80,81]. 

There  have  been  numerous  efforts  to  add  an  error  model  to  geometric  hash¬ 
ing,  and  to  investigate  its  performance  in  the  presence  of  positional  noise  of  the 
features.  Costa  et  al.  [25]  investigate  the  variation  of  the  standard  hash  func¬ 
tions  in  the  presence  of  noise,  and  suggest  a  weighted- voting  scheme.  Lamdan 
and  Wolfson  [60]  investigate  both  analytically  and  empirically  the  false-alarm 
rate,  and  conclude  that  acceptable  filtering  is  possible,  although  a  degradation 
in  performance  can  be  expected  for  affine-invariant  matching.  Grimson  and  Hut- 
tenlocher  [36]  give  pessimistic  predictions  for  affine-invariant  matching  using  geo¬ 
metric  hashing.  Rigoutsos  and  Hummel  [79,80]  look  at  error  rates  in  the  presence 
of  noise  for  both  similarity  and  affine  invariance,  and  conclude  that  weighted  vot¬ 
ing  is  essential  for  recognition  in  the  affine  case.  Gavrila  and  Groen  [34]  report 
good  filtering  capabilities  of  similarity-invariant  model  matching  in  the  presence 
of  noise.  Fischer  et  al.  apply  the  geometric  hashing  method  to  the  problem  of 
structural  comparison  of  proteins  [29]. 
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There  is  also  a  number  of  corporate  research  groups  that  are  using  geometric 
hashing  methods  for  US  Defense  Department-funded  projects.  We  are  aware  of 
geometric  hashing  investigations  at  I-Math  Associates,  in  Orlando,  Florida,  at 
Martin  Marietta  Denver,  and  at  The  Analytic  Sciences  Corporation. 
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Chapter  3 


Exploiting  Parallelism 


In  this  chapter,  we  examine  the  issue  of  parallelism  in  the  context  of  geometric 
hashing.  Two  parallel  algorithms  that  realize  the  geometric  hashing  method  are 
described.  In  addition  to  these  two  algorithms,  a  novel  hypercube-based  radix- 
sort  algorithm  as  well  as  a  number  of  general  performance  enhancements  are 
presented. 

The  first  algorithm  is  based  on  a  “connectionist”  view  of  the  geometric  hashing 
method  and  is  designed  for  an  SIMD  hypercube-based  machine.  The  algorithm 
is  data  parallel  over  the  hash  table  entries  and  regards  geometric  hashing  as  a 
connectionist  algorithm  with  information  flowing  via  patterns  of  communication. 

The  second  algorithm  is  more  general,  and  relies  on  a  “data  broadcast”  ap¬ 
proach.  This  algorithm  is  data  parallel  over  combinations  of  small  subsets  of 
model  features,  and  is  inspired  by  the  method  of  inverse  indexing  for  data  re¬ 
trieval  [87].  The  algorithm  treats  the  parallel  architecture  as  a  source  of  “intelli¬ 
gent  memory.” 

Both  algorithms  are  sequential  over  the  observed  image  features.  This  is 
a  fundamental  design  decision  and  is  motivated  by  the  fact  that  the  geometric 
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hashing  method  greatly  speeds  the  search  over  the  database  containing  the  models 
and  the  anchor  points  within  the  models,  while  still  requiring  a  search  over  the 
set  of  features  in  the  image.  However,  once  a  set  of  candidate  features  (a  basis 
tuple)  is  selected  such  that  the  features  belong  to  a  model  embedded  in  the  scene, 
recognition  of  the  model  will  follow. 

An  important  step  during  the  recognition  phase  is  the  histograming  of  those 
hash  table  entries  that  received  one  or  more  votes  during  the  voting  process. 
One  of  the  approaches  to  histograming  is  the  use  of  sorting:  a  novel  and  simple 
hypercube-based  radix-sort  algorithm  is  described. 

3.1  Parallelizability  of  Geometric  Hashing 

One  of  the  major  advantages  of  the  geometric  hashing  method  is  that  it  is  inher¬ 
ently  parallelizable.  Its  parallel  nature  manifests  itself  both  in  the  preprocessing 
(off-line)  and  the  recognition  (on-line)  phase  of  the  algorithm.  There  are  several 
ways  that  the  algorithm  can  be  parallelized,  depending  on  the  mapping  of  data 
items  to  the  available  processing  elements  (PE’s). 

Let  us  recall  the  description  of  the  preprocessing  phase  from  section  2.2.1. 
We  assume  that  the  database  will  contain  M  models  each  consisting  of  n  point 
features,  and  that  the  models  are  allowed  to  undergo  similarity  transformations 
(thus,  two  points  are  needed  to  define  a  basis  tuple). 

Parallelism  in  the  Preprocessing  Phase.  During  the  creation  of  the  hash 
table  data  structure,  and  for  a  given  model  and  basis  selection,  we  compute  the 
hash  invariants  for  each  subset  of  model  features  comprising  the  selected  basis 
and  each  of  the  remaining  (n  —  2)  model  features.  Clearly,  this  computation  can 
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proceed  in  parallel.  Depending  on  the  number  of  available  PE’s,  the  computations 
corresponding  to  the  n(n  —  1)  distinct  ordered  bases  possible  may  also  proceed 
in  parallel.  Each  of  the  n(n  —  1  )(n  —  2)  PE’s  will  compute  one  hash  location  in 
0(1)  time.  Finally,  if  at  least  Mn(n  —  1  )(n  —  2)  PE’s  are  available,  the  hash 
invariants  corresponding  to  all  possible  ( basis ,  model  —  feature)  combinations 
can  be  computed  in  0(1)  parallel  time. 

Once  a  set  of  hash  invariants  is  available,  the  relevant  (model,  basis)  infor¬ 
mation  can  be  stored  in  the  appropriate  hash  bins,  again  in  parallel.  This  may 
result  in  contention,  when  more  than  one  PE’s,  having  computed  the  same  hash 
invariant,  attempt  to  deposit  distinct  (model,  basis)  tuples  in  the  same  hash  bin. 
A  simple  protocol  can  be  used  to  impose  an  arbitrary  ordering  on  the  compet¬ 
ing  PE’s. 

Parallelism  in  the  Recognition  Phase.  During  the  recognition  phase,  S  fea¬ 
tures  from  the  feature  extraction  process  are  presented.  Once  a  basis  has  been 
selected,  hash  invariants  must  be  computed  for  each  set  of  features  composed  of 
the  selected  basis  and  one  of  the  remaining  S  —  2  scene  features.  Assuming  the 
availability  of  at  least  S  PE’s,  the  computation  of  these  invariants  can  proceed  in 
parallel  and  the  appropriate  hash  bins  are  notified.  If  there  is  at  least  as  many 
PE’s  as  hash  bins,  then  the  list  of  entries  associated  with  each  bin  can  be  stored  in 
the  local  memory  of  a  PE  which  is  assumed  to  control  that  bin.  The  local  lists  can 
then  be  traversed  in  parallel,  with  the  longest  such  list  dominating  the  required 
time  for  the  list  traversal.  Finally,  as  we  will  see  in  section  3.4,  the  histograming 
of  the  votes  received  by  the  various  (model,  basis)  combinations  can  also  proceed 
in  parallel. 
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In  the  above  description,  we  use  only  one  basis  pair  during  the  recognition 
phase.  However,  by  repeating  the  coordinates  of  the  S  interest  points,  we  can 
employ  multiple  bases  at  once,  at  the  expense  of  additional  bookkeeping  on  a  per 
processor  basis.  For  example,  if  the  input  images  have  a  maximum  of  200  interest 
points,  as  many  as  256  bases  can  be  computed  at  the  same  time  on  a  64 K- 
processor  machine.  The  PE  assigned  to  a  hash  bin  must  maintain  a  separate 
counter  for  each  of  the  256  bases.  The  z-th  such  counter  counts  the  number 
of  times  that  the  bin  is  hit  by  invariants  that  are  computed  in  the  coordinate 
system  defined  by  the  z-th  basis.  During  the  histograming  step,  messages  must 
be  “tagged”  with  one  of  256  possible  identifiers,  before  performing  the  evidence 
accumulation. 

3.2  Some  Definitions 

We  will  now  give  several  definitions  that  will  facilitate  the  description  of  the  two 
algorithms. 

Typically,  two  separate  components  can  be  (logically  and/or  physically)  iden¬ 
tified  in  any  SIMD  supercomputer.  One  component  is  the  array  of  processors 
executing  the  various  instructions.  The  other  component  is  a  traditional  sequen¬ 
tial  computer  that  controls  the  array  of  processors.  The  role  of  the  sequential 
computer  is  to  provide  a  familiar  environment  to  the  users  of  the  supercomputer 
for  program  development  and  execution.  User  programs  execute  on  the  sequen¬ 
tial  computer  which  translates  the  segments  of  code  that  are  to  be  executed  in 
parallel  into  sequences  of  instructions  plus  data;  these  sequences  are  subsequently 
relayed  to  the  various  processors  of  the  array  with  the  help  of  a  communication 
network.  We  will  be  referring  to  the  sequential  machine  as  the  host. 
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SIMD  supercomputers  are  the  typical  target  architectures  for  implementing 
data  parallel  algorithms.  Data  parallelism  separates  tasks  to  be  performed  con¬ 
currently  according  to  the  indices  of  items  in  the  data  structures  participating 
in  the  algorithm.  This  indexing  effectively  associates  one  data  item  with  one 
processor  from  the  processor  array.  The  number  of  processors  comprising  the  ar¬ 
ray  is  typically  limited  to  a  few  tens  of  thousands,  whereas  in  many  applications 
the  data  sets  have  cardinalities  that  are  in  the  order  of  millions  of  items.  For 
this  reason,  software  support  provides  the  user  with  a  virtual  processor  facility. 
This  facility  multiplexes  the  physical  processors  of  the  processor  array  in  order 
to  simulate  a  machine  with  many  more  than  the  actually  available  processors. 
The  local  memory  of  each  physical  processor  is  divided  evenly  among  the  virtual 
processors  it  simulates,  and  all  physical  processors  simulate  the  same  number  of 
virtual  processors.  The  number  of  virtual  processors  simulated  by  one  physical 
processor  will  be  referred  to  as  the  11  virtual  processor  ratio ,”  or  “VPR”  [95];  a 
VPR  value  of  r  incurs  an  at  least  r-fold  slowdown  in  execution  speed. 

Finally,  many  applications  consist  of  more  than  one  data  set  that  are  mapped 
to  the  array  of  processors  during  the  lifetime  of  the  program.  We  will  call  the  set 
of  virtual  processors  to  which  a  given  data  set  is  mapped  a  llvirtual  processor  set ,” 
or  “VP  set”  [95]  for  short.  At  any  time  during  program  execution,  more  than 
one  VP  set  may  be  in  existence;  these  VP  sets  will  correspond  to  distinct  data 
sets.  Clearly,  the  physical  processors  simulating  the  different  VP  sets  will  not  be 
disjoint,  implying  that  at  most  one  VP  set  can  be  active  at  any  moment.  Distinct 
VP  sets  can  communicate  with  one  another  either  through  message  passing,  or 
by  sharing  variables. 
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3.3  Design  Issue 


Three  data  sets  that  can  be  identified  in  the  description  of  the  recognition  phase 
(section  3.1)  are:  the  set  of  the  coordinates  of  the  input  image  features,  the  set 
of  the  coordinate  tuples  corresponding  to  all  possible  basis  tuples  formed  from 
image  features,  and  the  set  of  hash  table  entries. 

The  discussion  implicitly  assumed  the  following  mapping: 

•  The  coordinates  of  the  image  features  comprised  the  first  VP  set; 

•  The  hash  entries  belonging  to  a  given  hash  bin  were  coalesced  together  into 
one  group;  each  such  group  was  mapped  to  one  processor,  thus  forming  the 
second  VP  set. 

The  coordinates  of  a  selected  basis  are  broadcast  by  the  host  to  the  processors 
that  implement  the  first  virtual  processor  set.  This  approach  makes  the  algorithm 
data  parallel  over  the  image  features,  and  serial  over  the  set  of  basis  tuples. 

Alternatively,  each  member  of  the  set  of  basis  tuples  could  be  associated  with 
one  processor;  the  coordinates  of  the  image  feature  points  could  then  either  be 
broadcast  by  the  host,  or  held  in  temporary  storage.  This  mapping  results  in  a 
different  approach  to  performing  geometric  hashing.  This  latter  approach  is  data 
parallel  over  the  set  of  basis  tuples,  and  serial  over  the  image  features. 

As  can  be  seen,  for  a  given  problem,  there  is  relative  flexibility  in  deciding 
which  data  sets  are  mapped  to  the  available  processors,  and  how.  In  our  descrip¬ 
tion  of  the  parallel  algorithms  for  performing  geometric  hashing,  an  alternative 
mapping  will  be  presented. 
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3.4  Building-block  Algorithms 


Before  we  proceed  with  the  description  of  the  algorithms  for  performing  geometric 
hashing  in  parallel,  we  present  certain  building-block  algorithms  that  are  funda¬ 
mental  to  the  programming  of  a  hypercube-based  SIMD  architecture  and  which 
will  be  used  as  “subroutines”  by  our  algorithms.  We  assume  a  concurrent-read- 
exclusive-write  (CREW)  model  of  computation:  any  pattern  of  concurrent  reads 
to  neighboring  processors  uses  unit  time.  (Accesses  are  permitted  along  different 
dimensions  in  the  same  clock  cycle). 

3.4.1  The  j9-product 

The  first  of  the  building  block  algorithms  is  the  one  needed  to  perform  a  p-product. 
The  p-product  is  defined  as  follows:  given  p  finite  sets  A\  =  {ai,i1}i11=i,  A2  = 
{a2,i2  }§=i,  A3  =  {a3,i3}§=1,  Ap  =  {aPPp}fp= i,  the  p-product  At  x  A2  x  . . .  Ap 
is  the  set  of  all  the  ordered  p-tuples  {(ayp ,  a2 ,;2, .  .  .  ,  ®p,ip)}ii=i]i'2=i,...,i  =i- 

One  way  to  compute  the  p-product  is  to  perform  an  outer  product  p  —  1  times. 
An  outer  product  for  the  Connection  Machine  is  succinctly  described  in  [66]. 
An  extension  of  the  method  leads  to  a  direct  p-product  computation,  which  we 
describe  next. 

Using  standard  Gray-code  embedding  algorithms,  we  configure  the  hypercube 
as  a  p-dimensional  array  of  size  i\  by  l2  by  .  .  .  by  ip.  Let  Xj,  j  =  1,  2,  3, ...  , p 
denote  the  j-th  axis  of  this  configuration.  We  assume,  purely  for  convenience,  that 
the  s  are  powers  of  2,  and  that  a  sufficient  number  of  processors  is  available. 

The  processors  are  indexed  by  their  coordinates  (zi,z2,  •  •  •  ,  zp)  and,  initially, 
data  element  is  contained  in  processor  (z,  0, .  .  .  ,  0),  a2p  in  processor  (0,  z, .  .  .  ,  0) 
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etc.  In  other  words,  the  elements  of  set  Aj  are  spread  along  the  24-axis,  j  = 
1,2 ,p. 

The  algorithm  has  two  phases.  During  the  first  phase,  the  a1)4-  data  is  spread 
along  a  row  in  the  direction  of  the  x2-axis,  the  a2j;  data  is  spread  in  the  direction 
of  the  x3-axis  etc.  In  general,  the  aJti  data  is  spread  along  a  row  in  the  direction 
of  the  Zjmodfc+i-axis.  Figure  3.1  shows  a  simple  example  with  three  sets  Ai,  A2, 
and  A3  (for  the  case  of  a  triple  product). 

In  the  second  phase,  the  data  on  each  hyperplane  is  spread  into  the  entire 
hypercube,  first  spreading  the  data  on  the  (x1?x2,  .  .  .  ,  24_i)-plane  along  the  x k~ 
axis,  then  the  data  on  the  (x2,x3, .  .  .  ,  a^j-plane  along  the  24-axis,  etc.  Finally, 
the  data  on  the  (24,24, .  .  .  ,  24_2)-plane  is  spread  along  the  24_i-axis. 

Upon  completion,  processor  (A,  *2, .  .  . ,  A)  will  have  received  datum  <344  from 
(*1,0,...,  0),  datum  a2j82  from  processor  (0,  *2, .  .  .  ,  0),  etc.,  and  thus  has  the  p- 
product  element  (044 ,  a2 j82, .  .  .  ,  044). 

The  operation  of  spreading  data  along  a  single  axis  that  occurs  during  both 
phases  can  clearly  be  performed  in  0(£j)  time,  since  nearest  neighbors  are  adjacent 
in  the  hypercube,  but  can  in  fact  be  completed  in  0(log£j)  time.  This  is  because 
we  may  use  a  recursive  doubling  scheme  to  spread  the  data  rapidly  along  the 
axis.  (Algorithms  of  this  kind  are  described  by  Hillis  [41].)  In  the  parlance  of  the 
Connection  Machine  Paris  language,  the  operation  is  a  “scan_with_copy.” 

Power-of-two  communication  along  each  axis  is  provided  by  0(1)  communication 
cycles  due  to  the  Gray-code  embedding.  Specifically,  if  G(z),  *  =  0, 1, .  .  .  ,  n  —  1,  is 
a  Gray-code  (n  a  power  of  2),  then  it  can  be  shown  that  G(z)  and  G((z+2<i)  mod  n) 
differ  in  at  most  two  bits,  and  thus  can  be  connected  by  two  communications  cycles 
on  a  hypercube.  This  is  true  for  any  value  of  d. 
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3.4.2  Histograming 

The  second  building-block  algorithm  is  needed  to  perform  histograming  in  par¬ 
allel.  Histograming  can  be  defined  as  follows:  given  a  collection  of  data  {eii}f=11 
such  that  each  a4-  is  an  element  of  a  finite  collection  of  possible  values,  say 
cii  £  {1,2, V},  determine  a  count  of  the  number  of  elements  equal  to  each 
possible  output  value,  i.e.,  H(k)  =  ff{ i  \  a =  k}.  The  value  H(k)  is  also  known 
as  the  frequency  of  value  k. 

As  pointed  out  in  [66],  there  are  three  distinct  approaches  to  parallel  his¬ 
tograming: 

(1)  Sequentially  iterate,  from  1  through  V ;  for  each  value  k,  allow  each  a4-  to 
mark  itself  if  it  is  equal  to  k,  and  then  perform  a  parallel  sum  to  count  the 
number  of  elements  that  were  marked.  Since  parallel  summing  is  (D(\ogD), 
this  method  has  parallel  complexity  0(V  log  D). 

(2)  Perform  additive  writes;  each  processor  looks  at  its  value  ay  and  sends  a 
message  to  processor  a4-  to  increment  an  accumulator.  There  are  two  virtual 
processor  sets,  the  initial  set  with  D  processors,  one  per  data  element  ay 
and  a  set  of  V  processors  in  the  output,  containing  one  accumulator  per 
processor.  The  parallel  complexity  on  an  SIMD  hypercube  without  addi¬ 
tive  writes  is  (9(log2  D  +  log  V),  although  additive  writes  will  typically  be 
handled  in  an  average-case  more  efficient  method.  In  practice,  the  mes¬ 
sages  to  increment  accumulators  will  be  combined  in  a  probabilistic  routing 
network,  to  avoid  serialization  at  the  location  of  the  accumulators.  The 
complexity  is  in  all  cases  at  least  O(log  D),  since  if  all  messages  are  destined 
to  a  single  processor,  then  the  combination  of  the  messages  is  equivalent  to 


44 


a  global  sum,  but  in  practice,  the  complexity  will  depend  upon  V  and  D, 
the  efficiency  of  the  routing  algorithm  and  the  combining  of  messages  in  the 
router. 

(3)  Sort  the  data,  so  that  cpqq  (where  7r(-)  is  a  permutation)  forms  a  nonde¬ 
creasing  sequence,  for  i  =  1,  ...D.  For  example,  the  Batcher  bitonic  sort  al¬ 
gorithm  operates  on  a  hypercube  machine  in  (9(log2  D)  time.  After  sorting, 
each  processor  can  determine  if  the  data  in  the  processor  to  its  left  is  differ¬ 
ent.  If  so,  it  marks  itself  as  the  head  of  a  constant-data  block.  Since  each 
processor  needs  to  be  able  to  communicate  with  its  neighboring  processor 
for  this  step,  the  processors  should  be  configured  as  a  one-dimensional  array 
embedded  in  the  hypercube,  using  a  Gray-code  embedding.  The  Batcher 
sort  process  is  still  efficient  in  this  configuration,  although  with  a  penalty  in 
the  proportionality  constant.  Next,  a  segmented  parallel  prefix  sum  is  used 
to  count  the  number  of  processors  in  each  constant-data  block  and  this  infor¬ 
mation  is  delivered  to  the  head  processor  of  each  block.  Finally,  each  head 
processor  sends  the  information  about  the  cardinality  of  its  block  to  the  ap¬ 
propriate  histogram  bin,  cqqq;  this  is  the  processor  whose  index  is  equal  to 
the  data  item  shared  by  the  processors  of  the  block.  Since  the  destinations 
of  the  messages  are  distinct  and  ordered  relative  to  the  indices  of  the  source 
indices,  these  messages  can  be  sent  using  an  (9 (log  -D+log  V)  contention-free 
algorithm  of  the  sort  described  by  Nassimi  and  Sahni  [73].  The  total  parallel 
time  complexity  of  histograming  by  sorting  is  thus  (9(log2  D  +  log  V). 

For  our  purposes,  the  histogram  vector  is  not  needed;  rather,  we  only  need 
knowledge  of  the  few  maximum- vote-getting  values.  To  this  end,  the  final  stage 
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of  sending  messages  (method  (3)  above)  can  be  omited,  and  the  maximum  counts 
among  the  marked  processors  can  be  determined  and  relayed  to  the  front  end. 
Thus  the  process  of  finding  the  maximum  histogram  bin  can  be  accomplished  in 
(9(log2  D)  time. 

In  the  actual  implementation  of  our  system,  to  be  described  later,  we  used 
method  (2)  above.  However,  the  most  efficient  implementation  would  have  been 
to  use  method  (3)  in  conjunction  with  a  radix-sort  algorithm.  Lin  and  Kumar 
[65]  provide  a  hypercube-based  radix-sort  algorithm;  however,  because  they  sort 
from  high-order  bit  to  low-order  bit,  the  algorithm  is  unnecessarily  complicated. 
In  the  next  section,  we  present  a  simpler  method  for  performing  radix  sorting  on 
a  hypercube. 

3. 4. 2.1  A  Novel  Radix-Sort  Algorithm 

We  now  describe  a  novel  algorithm  for  performing  radix  sorting  on  a  hypercube. 
This  algorithm  has  the  same  time  complexity  as  the  one  described  by  Lin  and 
Kumar  [65],  but  it  is  much  simpler.  The  time  complexity  of  the  algorithm  is 
0(logV  x  log  D). 

The  algorithm  is  outlined  in  Figure  3.2,  whereas  Figure  3.3  illustrates  the 
algorithm  for  a  simple  example  data  set. 

3.5  The  Geometric  Hashing  Connectionist  Al¬ 
gorithm 

In  this  section,  we  present  the  first  of  the  two  data-parallel  algorithms  for  per¬ 
forming  geometric  hashing  on  a  hypercube-based  SIMD  architecture. 

The  algorithm  is  based  on  a  “connectionist”  view  of  the  geometric  hashing 
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Assume  that  the  values  in  the  sequence  { a; }f=i  to  be  sorted  are  represented  in  binary  bit 
form,  and  let  be  the  sequence  of  the  t-t.h  from-the-right  bits.  We  sort  the  values 

in  a  stable  fashion. 

For  l  beginning  at  zero,  and  successively  increasing  to  log  V  —  1,  we  do  the  following: 

a:  Mark  all  processors  with  b(  i  =  0. 

b:  Rank  these  processors:  each  marked  processor  determines  its  relative  position  among 
all  marked  processors  using  a  parallel  prefix  sum  (Nassimi-Sahni  [73]  describe  a 
RANK  algorithm).  Let  i’i  be  the  rank  of  a  processor  if  it  is  marked  and  t  be  the 
maximum  r,; . 

c:  Mark  all  processors  with  b(i  =  1. 

d:  Rank  these  processors  as  well;  let  s,;  be  the  rank  of  the  7-th  such  processor. 

e:  Move  the  a,;  data:  every  processor  with  b(  i  =  1  sends  its  data  a,;  to  processor  t  +  s,;, 
while  every  processor  with  b(  i  =  0  sends  its  data  to  processor  Because  the  paths 
of  communication  are  ordered,  this  routing  can  be  completed  in  (9(log  D)  time,  using 
the  CONCENTRATE  and  DISTRIBUTE  algorithms  from  [73], 

After  the  first  iteration,  all  items  are  stably  sorted  with  respect  to  their  low-order  bit.  Upon 
termination,  the  sequence  { a; }f=i  will  be  sorted. 

Figure  3.2  Simple  Radix  Sort  on  a  Hypercube. 
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Figure  3.3  Radix  Sort:  an  illustration. 
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method  and  is  data  parallel  over  the  hash  bin  entries  and  the  image  features. 
The  database  is  assumed  to  contain  M  models;  each  model  m  has  an  associated 
set  of  features,  ym,k)}l=1 ,  containing  the  coordinates  of 

the  model’s  n  features.  Without  loss  of  generality,  the  coordinates  of  the  features 
of  all  models  are  assumed  to  reside  in  the  local  memory  of  the  host.  The  number 
of  features  extracted  from  the  input  image  is  S',  and  the  hash  table  consists  of 
B  bins.  Since  we  do  not  wish  to  restrict  ourselves  by  making  any  assumptions 
regarding  the  transformation  that  the  database  models  are  allowed  to  undergo, 
the  cardinality  of  the  basis  tuple  will  be  a  parameter,  and  equal  to  c.  Finally,  the 
availability  of  0(Mnc+1)  PE’s  is  assumed. 

3.5.1  Connectionist  Algorithm:  Preprocessing  Phase 

During  the  preprocessing  phase,  the  hash  table  is  created  for  the  set  of  M  models. 
Four  virtual  processor  sets  participate  in  this  phase:  the  model  feature  set  the 
(c  + l)-product  set  V2,  the  hash  bin  set  V3,  and  the  set  of  hash  bin  entries  V4.  This 
phase  is  completed  in  two  passes.  The  purpose  of  the  first  pass  is  to  determine 
the  number  of  entries  that  will  hash  to  each  bin  of  the  hash  table,  when  all  the 
models  in  the  database  are  considered.  During  the  second  pass  the  hash  entries 
are  actually  stored  in  the  hash  table. 

Preprocessing  Phase:  Pass  1 

For  each  model  m  do: 

Stage  1:  The  host  relays  the  coordinates  of  the  m-th  model’s  features  to  the  n 
PE’s  of  set  V\\  the  z-th  element  of  set  will  reside  in  the  local  memory  of  the 
z-th  PE  of  set  V\. 
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Stage  2:  The  (c+l)-product  ( Jzm)c+1  is  computed  using  the  p-product  algorithm 
described  in  section  3.4.  Each  of  the  nc+1  processors  of  set  V2  now  contains  a 
(c  +  l)-tuple  of  the  form  [(zq,  yh)  ,  (xl2,  yl2)  , .  .  .  ,  (zi(c+1),  y;(c+1))]  £  (Bm)c+1 .  The 
first  c  points  of  each  such  tuple  define  an  ordered  basis  and  thus  a  coordinate 
system.  Some  of  those  bases  are  formed  by  repeating  the  same  feature  point,  and 
thus  are  degenerate;  the  corresponding  virtual  processors  will  be  disabled  and  will 
not  participate  in  the  rest  of  the  computation.  Additionally,  those  processors  in 
charge  of  tuples  where  the  (c  +  l)-st  point  coincides  with  one  of  points  forming 
the  basis  will  be  disabled  as  well.  The  remaining  processors  proceed  to  compute 
the  coordinates  of  the  (c  +  l)-st  point  in  the  coordinate  system  defined  by  the 
basis.  Subsequently,  these  coordinates  are  converted  to  a  hash  bin  number,  i.e. 
the  index  h  of  a  processor  in  the  two-dimensional  set  V3. 

Stage  3:  Each  processor  of  V2,  with  information  destined  for  a  certain  hash  bin, 
sends  an  additive  write  with  increment  1  to  an  accumulator  in  that  bin.  This  is 
effected  by  sending  a  message  to  the  processor  of  V3  in  charge  of  the  corresponding 
hash  bin.  A  subsequent  parallel  prefix  operation  on  the  resulting  counts  allows  us 
to  organize  the  set  V4  of  hash  entries  into  a  one-dimensional  array  so  that  entries 
belonging  to  the  same  hash  bin  occupy  a  contiguous  block  of  processors.  There 
is  a  total  of  B  such  blocks,  and  the  length  of  each  block  is  precisely  equal  to  the 
number  of  expected  entries  in  the  corresponding  hash  bin.  The  first  processor  of 
a  block  is  assumed  to  head  the  block.  Finally,  a  map  is  built:  the  h- th  processor 
of  V3  stores  locally  the  index  Th  of  the  V4  processor  heading  the  block  of  entries 
for  the  h- th  hash  bin. 
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Figure  3.4  details  the  first  pass  for  the  case  where  the  basis  tuple  consists  of 
two  points. 

Preprocessing  Phase:  Pass  2 

For  each  model  m  do: 

Stage  1:  The  same  as  in  the  first  pass. 

Stage  2:  The  same  as  in  the  first  pass. 

Stage  3:  At  this  point,  each  of  the  active  processors  of  C2  has  computed  the 
index  h  of  a  processor  in  the  two-dimensional  set  V3;  the  latter  will  receive  a 
message  requesting  the  index  of  the  next  available  processor  in  the  group  of  V4 
processors  headed  by  Th-  Clearly,  it  may  happen  that  more  than  one  processors  of 
V2  send  a  request  to  the  same  V3  processor.  A  variable  in  the  local  memory  of  the 
h- th  processor  of  V3,  initially  equal  to  Th,  can  provide  the  necessary  information. 
Combining  this  with  an  SIMD  version  of  a  parallel  Fetch- and- Add  resolves  the 
contention  simply  and  effectively.  This  stage  concludes  with  the  active  processors 
of  V2  sending  a  tuple  of  the  form  (m,  (i1?  i2, .  .  .  ,  ic))  to  the  appropriate  processor 
of  V4. 


Figure  3.5  details  the  second  pass  for  the  case  where  the  basis  tuple  consists  of 
two  points. 

The  hash  table  is  represented  by  two  data  structures.  The  first  of  the  two 
structures  (set  V3)  contains  one  processor  for  each  hash  bin  h,  and  gives  the 
pointers  to  a  head  processor  Th  of  a  block  of  data  in  the  second  structure  (set  V4). 
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Figure  3.5  The  third  stage  of  the  second  pass  of  the  preprocessing 
phase  for  the  case  where  the  basis  tuple  consists  of  two  points. 


This  latter  structure  consists  of  at  most  M  (™)(n  —  c)c\  hash  bin  entries  which 
are  of  the  form  (m,  (?’i,  z2, .  .  .  ,  zc)). 

3.5.2  Connectionist  Algorithm:  Recognition  Phase 

Four  virtual  processor  sets  participate  in  the  recognition  pliax-:  the  feature  co¬ 
ordinate  set  Vi,  the  hash  table  sets  V2  and  V3,  and  the  histogram  bin  set  V4.  We 
assume  that  the  coordinates  of  the  features  that  were  extracted  from  the  input 
image  reside  in  the  local  memory  of  the  S  processors  of  the  set  the  coordi¬ 
nates  of  the  z-th  image  feature  reside  in  the  memory  of  the  z-th  processor  of  V\. 
The  hash  table  is  preloaded  from  storage  into  the  local  memory  of  the  processors 
comprising  the  sets  V2  and  V3:  the  VP  set  V2  contains  the  pointers  to  the  heads 
of  the  blocks  of  entries  whereas  V3  is  the  one-dimensional  array  of  concatenated 
lists  of  hash  entries.  This  phase  of  the  algorithm  proceeds  in  three  stages. 


52 


Recognition  Phase 


Stage  1:  The  host  selects  a  basis  tuple  (a  set  of  c  image  features)  and  relays 
its  coordinates  to  the  S  processors  of  V\.  Each  processor  in  V\  combines  the 
coordinates  of  the  feature  stored  locally  with  the  broadcast  tuple  to  compute  the 
feature’s  coordinates  in  the  system  defined  by  the  basis.  The  coordinates  are 
subsequently  used  to  determine  the  index  of  the  hash  bin  (processor  in  V2)  to  be 
notified. 

Stage  2:  In  the  second  stage,  messages  saying  “you  receive  one  vote”  are  sent 
by  the  processors  of  V\  to  the  appropriate  processors  in  V2.  The  messages  are 
sent  using  additive  writes  and  the  general  routing  facilities  provided  by  the  in¬ 
terconnection  network;  multiple  votes  destined  for  the  same  recipient  processor 
combine  en  route  to  the  destination. 

Stage  3:  In  the  last  stage,  every  processor  h  from  the  set  V2  that  received 
one  or  more  messages  in  the  previous  stage  relays  the  number  of  votes  that  it 
received  to  the  block  of  processors  T 'h  through  Th+i  —  1  of  V3.  This  operation 
can  be  done,  for  example,  using  a  modified  version  of  Nassimi-Sahni’s  GENER¬ 
ALIZE  algorithm  [73].  Alternatively,  every  processor  h  from  set  V2  can  send  a 
message  containing  the  number  of  votes  (which  might  be  zero)  to  the  processor 
Th  in  V3.  Using  a  parallel  prefix  computation  with  “copy  from  the  left”  as  the 
binary  associative  operator,  processor  Th  can  then  spread  the  count  to  the  re¬ 
maining  members  of  its  group.  At  this  point,  we  wish  to  histogram  the  entries 
of  the  processors  in  the  set  V3  using  the  multiplicities  determined  in  the  previous 
step.  Use  of  the  radix  sort  algorithm  that  was  described  in  section  3.4.2. 1  offers 
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advantages  from  a  complexity  viewpoint.  Each  processor  of  U4  is  associated  with 
one  histogram  bin  representing  a  tuple  of  the  form  (m,  (ii,i 2,  ■  ■  ■  ,  U))-  Upon  ter¬ 
mination  of  the  histograming  step,  each  of  the  M  (”)  cl  processors  of  V4  contains 
the  frequencies  of  the  corresponding  ( model ,  basis )  combination.  A  thresholding 
operation  of  the  vote  tallies  over  the  processors  of  the  set  V4  recovers  the  winning 
(m,  (i1?  i2,  ■  ■  .  ,  ik))  combinations.  These  combinations  are  communicated  back  to 
the  host  which  will  verify  the  existence  of  matching  models. 


Figure  3.6  details  the  recognition  phase  for  the  case  where  the  basis  tuple 
consists  of  two  points. 

3.5.3  Time  Complexity 

For  the  time  complexity  of  the  algorithm,  we  assume  that  M  (”)  (n  —  c)c\  PE’s  are 
available.  The  interconnection  network  is  a  hypercube,  and  the  concurrent-read- 
exclusive-write  (CREW)  model  of  computation  is  adopted.  We  further  assume 
that  .1/  •//..!/  •  .S',  and  Mnc+1  B. 

Preprocessing  Phase:  The  complexity  of  this  phase  is  dominated  by  the  third 
stage  of  the  second  pass.  During  that  stage,  the  0(nc+1)  processors  of  U2  begin  by 
requesting  the  index  of  the  processor  in  U4  that  will  eventually  receive  a  message 
of  the  form  (m,  (z4,  z2, .  .  .  ,  ic))  to  a  processor  in  U3.  Although,  in  principle,  it 
is  possible  that  all  the  active  processors  of  U2  send  their  request  to  the  same 
processor  of  U3,  thus  forcing  serialization  and  increasing  the  time  complexity  of 
this  step  to  0(nc+1  log  B),  fewer  than  a  small  constant  number  of  collisions  are 
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Figure  3.6  The  recognition  phase  of  the  parallel  geometric  hashing 
connectionist  algorithm,  for  the  case  where  the  basis  tuple  consists  of 
two  points.  Note  how  tokens  flow  from  one  set  via  connections  to  the 
next  set. 
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expected  to  occur  on  the  average.1  As  a  result,  the  time  complexity  of  this 
step  is  0(logB)  per  database  model.  The  active  processors  of  V2  continue  by 
forwarding  a  message  of  the  form  (m,  (i1?  i2,  .  .  . ,  ic))  to  the  appropriate  processor 
of  V4;  it  should  be  noted  that  no  two  processors  of  V2  will  forward  to  the  same 
target  processor  of  V4.  This  second  step  can  be  completed  in  O  (log  (Mnc+1)) 
time  per  database  model.  The  time  complexity  of  the  preprocessing  phase  is 
thus  O  (M  log  ( Mnc+ 1))  which  is  the  same  as  O  ( M  log  ( Mn )).  The  preprocessing 
phase  is  very  expensive,  even  when  carried  out  in  parallel,  but  this  is  of  no  concern 
since  it  is  executed  off-line. 

Recognition  Phase:  The  time  complexity  of  the  recognition  phase,  per  broad¬ 
cast  basis  tuple,  is  dominated  by  the  histograming  step.  In  fact,  the  time  com¬ 
plexity  of  the  remaining  operations  of  the  recognition  phase  is  no  worse  than 
O  (log  (Mnc+1))  which  is  the  same  as  O  (log  (Mn)).  The  complexity  of  his¬ 
tograming  depends  on  the  particular  method  that  is  used;  if  Batcher’s  bitonic 
sort  [76]  is  used  to  perform  the  histograming,  the  time  complexity  of  the  recog¬ 
nition  phase  is  O  (log2  (S Mnc+1)^j  or  equivalently  O  (log2  ( SMn )^.  On  the  other 
hand,  if  the  radix  sort  algorithm  that  was  described  in  section  3.4.2. 1  is  used,  the 
time  complexity  of  the  recognition  phase  drops  to  O  (log  ( SMnc+1 )  log  ( Mnc ))  or 
O  (log  (SMn)  log  (Mn)). 

3.6  The  Hash-location  Broadcast  Algorithm 

In  this  section,  we  present  the  second  of  the  two  data-parallel  algorithms  for 
performing  geometric  hashing. 

^^This,  in  fact,  is  a  basic  characteristic  of  geometric  hashing. 
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Unlike  the  first  algorithm  which  is  based  on  a  “connectionist”  view  of  the 
geometric  hashing  method,  the  second  algorithm  makes  use  of  the  parallel  machine 
as  a  source  of  “intelligent  memory.”  The  algorithm  is  inspired  by  the  method  of 
inverse  indexing  [87],  and  is  data  parallel  over  combinations  of  small  subsets  of 
model  features. 

The  main  characteristic  of  the  connectionist  algorithm  is  that  the  various 
( model ,  basis )  combinations  are  the  data  items;  these  combinations  are  grouped 
by  means  of  the  hash  table  according  to  the  invariant  tuples  that  they  generate. 
On  the  other  hand,  in  the  broadcast  approach,  the  data  items  are  the  invariants 
generated  by  the  (model,  basis )  combinations,  and  they  can  be  grouped  based  on 
the  feature  combinations  that  generate  these  combinations. 

Again,  the  database  is  assumed  to  contain  M  models,  and  each  model  m 
has  associated  with  it  a  set  of  features,  =  {(xm^,  ym,k)}l=1  ■  Without  loss  of 
generality,  the  coordinates  of  the  features  of  all  models  are  assumed  to  reside  in  the 
local  memory  of  the  host  computer.  The  number  of  features  extracted  from  the 
input  image  is  S.  Furthermore,  we  will  make  no  assumptions  with  regard  to  the 
transformation  that  the  database  models  are  allowed  to  undergo;  the  cardinality 
of  the  basis  tuple  will  be  a  parameter,  and  equal  to  c.  Finally,  the  availability  of 
0(Mnc+1)  PE’s  is  assumed. 

3.6.1  The  Data  Structure 

The  main  idea  behind  the  algorithm  is  the  following.  Let  us  assume  that  we 
have  formed  a  basis  tuple,  B,  by  selecting  a  set  of  c  features  from  model  m.  The 
coordinates  of  each  of  the  remaining  n  —  c  features  of  model  m,  in  the  coordinate 
system  defined  by  B,  will  need  to  be  computed;  this  will  generate  a  total  of 
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n  —  c  coordinate  pairs.  In  other  words,  there  will  be  one  coordinate  pair  for 
every  subset  of  model  features  comprising  the  selected  basis  B  and  each  of  the 
remaining  n  —  c  features  of  m.  One  could  conceivably  dedicate  one  PE  for  each 
such  combination,  requiring  (”)  (n  —  c)c\  PE’s  per  database  model;  each  such  PE 
holds  the  corresponding  coordinate  pair. 

Evidently,  a  data  structure  that  is  different  from  the  hash  table  data  structure 
of  section  3.5  is  required.  The  data  can  be  regarded  as  a  collection  of  records  of 
the  form  (m,  (R,  z2, .  .  .  ,  if)  ,  k,  (x,  y)),  where  (x,  y)  is  the  hash  location  to  which 
the  k- th  feature  of  model  m  maps  under  the  basis  tuple  B  =  (R,  z2, .  .  . ,  if). 
This  information  can  be  organized  in  a  (c  +  2)-dimensional  table  indexed  by 
(m,  (zi,  z2,...,zc),  k).  Recall  that  R,  z2, .  .  .  ,  zc,  k  are  integers  between  1  and  n;  m 
is  an  integer  between  1  and  M.  Clearly,  not  all  of  the  M  (f)  ( n  —  c)c\  locations 
will  be  used:  for  example,  the  entries  of  the  table  corresponding  to  degenerate 
bases  will  be  empty.  Similarly,  the  table  locations  corresponding  to  feature  sub¬ 
sets  where  the  k- th  model  feature  has  also  been  used  to  form  B  will  also  be 
empty.  Then  self-index  of  a  table  location  suffices  to  recover  the  correspond¬ 
ing  (m,  (R,  z2, .  .  .  ,  zc)  ,  k)  information.  In  the  sequel,  we  will  refer  to  this  data 
structure  as  the  hash-function  table. 

3.6.2  Hash-location  Broadcast:  Preprocessing  Phase 

During  the  preprocessing  phase  the  hash-function  table  is  constructed  for  the  set 
of  M  models.  Three  sets  of  virtual  processors  participate  in  this  phase:  the  model 
feature  set  Vf,  the  (c  +  l)-product  set  V2,  and  the  hash-function  table  set  V3.  Each 
processor  of  V3  is  associated  with  exactly  one  table  location.  This  phase  consists 
of  three  stages. 
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Preprocessing  Phase 

For  each  model  m  do: 

Stage  1:  The  host  relays  the  coordinates  of  the  m-th  model’s  features  to  the  n 
PE’s  of  set  V\\  the  z-th  element  of  set  will  reside  in  the  local  memory  of  the 
z-th  PE  of  set  V\. 

Stage  2:  The  (c+l)-product  {Jzm)c+1  is  computed  using  the  p-product  algorithm 
described  in  section  3.4.  Each  of  the  nc+1  processors  of  set  V2  now  contains  a 
(c  +  l)-tuple  of  the  form  \[xn  ,  yh ),  (xt2 ,  yn ),...,  (z;(c+1) ,  y;(c+1))]  G  {T m)c+1-  The 
first  c  points  of  each  such  tuple  define  an  ordered  basis  and  thus  a  coordinate 
system.  Some  of  those  bases  are  formed  by  repeating  the  same  feature  point,  and 
thus  are  degenerate;  the  corresponding  virtual  processors  will  be  disabled  and 
will  not  participate  in  the  rest  of  the  computation.  Additionally,  those  processors 
in  charge  of  tuples  where  the  (c  +  l)-st  point  coincides  with  one  of  the  points 
forming  the  basis  will  be  disabled  as  well.  The  remaining  processors  proceed  to 
compute  the  coordinates  of  the  (c  +  l)-st  point  in  the  coordinate  system  defined 
by  the  basis. 

Stage  3:  Each  processor  of  V2  with  information  destined  for  a  certain  table 
location,  sends  a  tuple  of  the  form  (m,  (i1?  z2, .  .  .  ,  zc) ,  k,  (x,  y))  to  that  location. 
Since  the  destinations  of  these  messages  are  pairwise  distinct,  no  collisions  could 
possibly  occur  during  this  stage. 
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Figure  3.7  Hash-location  broadcast  algorithm:  the  preprocessing 
phase  for  the  case  where  the  basis  tuple  consists  of  two  points. 
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Figure  3.7  graphically  depicts  the  preprocessing  phase  for  the  case  where  the 
basis  tuple  consists  of  two  points. 

3.6.3  Hash-location  Broadcast:  Recognition  Phase 

Two  virtual  processor  sets  participate  in  this  phase:  the  feature  coordinate  set 
hi,  and  the  hash-function  table  set  V2. 

We  assume  that  the  coordinates  of  the  features  that  were  extracted  from  the 
input  image  reside  in  the  local  memory  of  the  S  processors  of  the  set  V\\  the 
coordinates  of  the  z-th  image  feature  reside  in  the  memory  of  the  z-th  processor 
of  V\.  The  hash-function  table  is  again  preloaded  from  storage  into  the  local 
memory  of  the  processors  of  V2.  This  phase  proceeds  in  three  stages. 

Recognition  Phase 


Stage  1:  The  host  selects  a  basis  tuple  (a  set  of  c  image  features)  and  relays 
its  coordinates  to  the  S  processors  of  V\.  With  the  exception  of  the  c  processors 
whose  interest  points  form  the  selected  basis,  each  processor  in  V\  combines  the 
coordinates  of  the  feature  stored  locally  with  the  broadcast  tuple  to  compute  the 
feature’s  coordinates  in  the  system  defined  by  the  basis.  These  operations  involve 
minimal  data  movement  and  thus  are  extremely  fast. 

Stage  2:  In  the  second  stage,  the  data  from  S  —  c  processors  in  V\  are  succes¬ 
sively  broadcast  to  all  the  processors  of  the  set  V2.  Each  broadcast  contains  a 
coordinate  of  the  form  (it,  v),  and  gives  a  location  in  the  hash  table  where  a  vote 
should  be  tallied.  Each  processor  in  V2,  indexed  by  (m,  (i1?  z2, .  .  .  ,  zc)  ,  k)  with 
(zi,z2, .  .  .  ,Zfc)  corresponding  to  a  valid  basis  tuple  for  model  m,  contains  a  hash 
location  (coordinate  pair)  which  the  processor  can  compare  against  (u}v).  If  the 
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two  locations  are  sufficiently  close  together,  then  the  corresponding  table  location 
records  a  hit  indicating  a  vote  for  model  m  and  basis  (A,  z2, .  .  .  ,  A)-  We  will  see 
in  chapter  7,  that  an  extremely  useful  modification  permits  the  use  of  weighted 
voting  for  model-basis  tuples  determined  according  to  the  relative  proximity  of 
(it,  v)  to  (x,  y).  This  vote  originates  from  the  particular  feature  (among  the  S  —  c 
extracted  from  the  input  image)  whose  coordinates  in  the  frame  of  the  selected 
basis  are  being  broadcast.  The  tallying  of  votes  continues  by  accumulating  hits 
in  each  hash-function  table  location;  each  of  the  S  —  c  broadcasts  generates  either 
one  or  no  hits  at  any  table  location. 

Stage  3:  Upon  completion  of  the  tallying  step,  a  third  stage  is  invoked;  using 
a  segmented  parallel-sum  operation,  we  add  the  votes  over  k  among  locations 
(m,  (zi,  z2,...,zc),  k).  The  result  is  the  total  number  of  votes  that  the  model  m 
and  the  basis  (U,  z2, .  .  .  ,  zc)  obtain  for  the  given  scene  and  basis  selection.  Finally, 
a  global-max  among  the  processors  associated  with  the  locations  holding  the  sum 
of  votes  is  used  to  determine  the  winning  ( model ,  basis )  combination.  A  final 
verification  step  may  be  added  to  determine  the  quality  of  the  match. 


Figure  3.8  details  the  recognition  phase  for  the  case  where  the  basis  tuple 
consists  of  two  points. 

An  asymptotically  faster  alternative  to  the  Stages  2  and  3  described  above 
also  exists.  Strictly  speaking,  each  one  of  the  S  —  c  broadcasts  will  require 
O  (log  (Mnc+1))  time,  since  there  are  Mnc+1  processors  in  the  U2  data  set.  How¬ 
ever,  assuming  the  existence  of  S  storage  locations  in  each  processor  of  set  U2,  the 
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Each  PE  in  turn  broadcasts  its  local 
coordinate  pair  to  the  entire  table. 

Each  table  PE  compares  its  local  entry 
against  the  broadcast  coordinate  pair 
incrementing  a  local  counter  according 
to  the  outcome. 


Compute  the  votes  each  model/basis 
combination  receives. 

Determine  the  winning  combination. 


Figure  3.8  The  recognition  phase  of  the  parallel  hash-location  broad¬ 
cast  algorithm,  for  the  case  where  the  basis  tuple  consists  of  two 
points. 
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theoretical  complexity  can  be  decreased.  This  can  be  accomplished  as  follows. 


Assume  for  simplicity  that  S  =  nc.  Then  all  S  —  c  broadcasts  can  be  done 
simultaneously,  by  having  each  of  the  S  —  c  active  processors  in  V\  send  its  data 
to  a  unique  processor  in  a  c-dimensional  (n  X  n  X  .  .  .  X  n)  slice  of  the  (c  +  2)- 
dimensional  data  set  V2.  This  routing  can  be  completed  in  time  O(logn).  This 
slice  of  data  can  subsequently  be  spread  to  the  rest  of  V2,  in  parallel  slices,  requir¬ 
ing  no  more  than  O  (log  ( Mn ))  time.  Observe  that  after  this  spread  the  entire  set 
of  the  computed  coordinate  pairs  is  distributed  among  the  nc  processors  of  a  slice, 
one  coordinate  pair  per  processor.  Exactly  c  processors  in  each  slice  will  be  empty. 
The  processors  within  a  slice  can  now  exchange  their  data  in  such  a  way  that  the 
entire  list  of  computed  coordinate  pairs  becomes  available  to  every  single  one  of 
them.  This  can  be  simply  achieved  by  a  recursive  doubling  procedure  which  com¬ 
municates  data  between  pairs  of  processors,  and  forms  lists  of  coordinate  pairs. 
It  must  be  noted  though  that  the  entries  of  those  lists  will  not  necessarily  appear 
in  the  same  order  in  each  processor.  This  recursive  doubling  procedure  can  be 
completed  in  O  (S)  time.  Total  time  complexities  are  summarized  in  the  next 
subsection. 


3.6.4  Time  Complexity 

For  the  time  complexity  of  the  two  phases,  we  assume  that  M  (”)  ( n  —  c)c\  PE’s  are 
available.  The  interconnection  network  is  a  hypercube,  and  the  concurrent-read- 
exclusive-write  (CREW)  model  of  computation  is  adopted.  We  further  assume 
that  M  ■  11.  .\ I  ■  .S',  and  Mnc+1  B. 
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Preprocessing  Phase:  The  complexity  of  this  phase  is  dominated  by  the  third 
stage.  During  that  stage,  each  processor  of  set  V\  sends  a  message  of  the  form 
(m,  (zi,  (x,  y))  to  a  processor  in  V2.  The  destinations  of  all  these  mes¬ 

sages  are  pairwise  distinct;  thus  no  collision-resolution  protocols  will  be  required. 
As  a  result,  the  time  complexity  of  this  stage  is  O  (log  ( Mn ))  per  database  model. 
The  time  complexity  of  the  preprocessing  phase  is  thus  O  ( M  log  (Mn)).  The  pre¬ 
processing  phase  is  very  expensive,  even  when  carried  out  in  parallel,  but  again, 
this  is  of  no  concern  since  it  is  executed  off-line. 

Recognition  Phase:  The  time  complexity  of  the  recognition  phase,  per  broad¬ 
cast  basis  tuple,  is  dominated  by  the  second  stage.  Indeed,  the  time  complexity 
of  the  first  stage  is  O(logS').  The  second  stage,  using  the  data-spreading  trick 
described  at  the  end  of  the  previous  section  results  in  time  complexity  which  is 
no  worse  than  O  (S  +  log  (Mn)).  This  is  also  the  complexity  of  the  recognition 
phase  for  the  hash-location  broadcast  algorithm. 

3.7  Implementation  Details 

In  this  section,  we  present  in  detail  the  actual  implementations  of  the  two  algo¬ 
rithms  already  described.  Both  algorithms  were  implemented  on  a  hypercube- 
based  SIMD  Connection  Machine. 

Implementing  carefully-crafted  parallel  algorithms  on  existing  architectures 
frequently  involves  more  compromises  than  one  might  expect.  In  our  case,  the 
two  algorithms  that  we  have  described  have  assumed  the  existence  of  0  (Mnc+1) 
PE’s.  For  M  =  1024,  n  =  16,  and  the  similarity  transformation  (c  =  2)  this  would 
entail  a  4Meg-processor  machine.  Although  there  exist  64A"-processor  Connection 
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Machines,  it  is  more  usual  to  have  access  to  a  32K-  or  16A"-processor  model,  and 
to  do  the  prototyping  on  an  8A"-processor  model.  Conceivably,  one  could  make  use 
of  the  Connection  Machine  software  facilities  to  simulate  a  larger  parallel  machine 
by  mapping  multiple  processors  to  each  physical  processor,  which  then  execute  in 
round-robin  fashion.  Unfortunately,  the  overhead  involved  in  the  mapping  makes 
the  implied  “virtual  processor  ratio”  of  512  impractical.  Accordingly,  we  must 
modify  the  algorithms  somewhat. 

Connectionist  Algorithm.  Rather  than  associating  a  separate  processor  with 
each  hash  entry,  we  can  store  the  entire  list  of  entries  for  a  hash  bin  in  the  lo¬ 
cal  memory  of  a  single  processor.  In  order  to  implement  the  algorithm,  all  the 
entries  that  hash  into  the  same  bin  are  stored  contiguously  in  a  single  physical 
processor  memory  rather  than  associated  with  distinct  physical  processors.  More 
specifically,  each  hash  table  bin  is  mapped  onto  a  physical  processor;  this  proces¬ 
sor  maintains  locally  a  list  of  all  the  (m,  (R,  z2, .  .  .  ,  ic))  entries  hashing  into  the 
corresponding  hash  table  bin.  The  required  number  of  processors  drops  to  B}  the 
number  of  desired  hash  bins. 

The  preprocessing  phase  is  now  far  less  efficient,  due  to  the  need  for  random 
access  to  local  memory  as  entries  are  appended  to  the  lists.  Clearly,  the  lengths 
of  the  lists  will  vary  over  the  hash  table,  and  some  of  them  will  be  empty.  For  a 
given  database  of  models,  the  typical  occupancies  for  uniformly  quantized  hash 
bins  are  non-uniform  (see  Figure  3.9).  Provided  that  no  single  list  is  exorbitantly 
long,  memory  requirements  are  not  a  problem.  The  collisions  that  are  likely  to 
occur  during  this  phase  are  resolved  using  a  simple  protocol  based  on  “locks.” 

During  recognition,  the  entries  in  the  hash  bins  that  receive  votes  must  be 
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histogramed  (i.e.,  counted)  with  the  multiplicity  of  the  number  of  votes  that  each 
hash  bin  receives.  For  M  models  each  having  ??  points  and  a  basis  tuple  comprising 
c  features,  each  hash  entry  need  only  be  jjdogAf  +  clog??] -bits  long:  log  M  bits 
for  the  model  index,  and  log??  bits  for  the  index  of  each  of  the  c  basis  members. 
Rather  than  histograming  by  sorting,  we  opt,  as  already  indicated,  for  a  message 
passing  strategy  (additive  writes):  we  set  up  9ilosM+cl°snl  buckets  (which  results 
in  a  virtual  processor  ratio  of  9!losM+cl°S)d-13  on  the  8A’-processor  machine),  one 
bucket  for  each  (???,  (?’i,  ?2, .  .  .  ,  ic))  combination. 

The  processors  in  charge  of  those  hash  bins  that  received  one  or  more  votes, 
scan  their  local  lists  and  cast  a  vote  for  each  entry  (???,  (?’i,  ?2, .  .  .  ,  ic))  encountered 
by  sending  a  message  to  the  corresponding  histogram  bucket.  The  time  needed 
for  the  list  traversal  is  clearly  dominated  by  the  longest  such  list. 

This  histograming  process  currently  accounts  for  99%  of  the  execution  time  of 
the  recognition  phase.  Clearly,  efficiencies  in  histograming  will  very  much  improve 


67 


the  performance  of  the  implementation.  In  particular,  the  use  of  our  radix-sort 
based  method  of  Section  3.4.2  is  expected  to  considerably  reduce  processing  times. 
Further  improvements  in  efficiency  can  be  achieved  by  requantization  of  the  hash 
space,  and  the  use  of  symmetries,  and  will  be  examined  in  more  detail  in  chapter  5. 

Hash-location  Broadcast  Algorithm.  In  order  to  reduce  the  virtual  proces¬ 
sor  ratio  for  set  V2  in  the  implementation  of  the  second  algorithm,  we  have  chosen 
to  assign  one  processor  to  each  index  (m,  (i1?  i2, .  .  .  ,  A)),  and  store  contiguously  in 
that  processor’s  local  memory  the  n  entries  associated  with  k  =  1,2, ...  ,n.  This 
in  effect  collapses  one  of  the  dimensions  of  the  hash-function  table.  Consequently, 
during  the  construction  of  the  hash-function  table,  precisely  n  —  c  processors  at¬ 
tempt  to  deposit  a  distinct  tuple  at  the  same  destination.  Use  of  “locks”  provides 
the  necessary  serialization. 

During  recognition,  the  S  processors  of  V\  rank  themselves  using  either  the 
index  of  the  locally  available  image  feature,  or  their  own  address:  either  approach 
to  performing  the  ranking  involves  no  data  movement  and  is  therefore  extremely 
fast.  After  having  computed  the  relative  coordinates  of  the  local  feature  in  the 
frame  determined  by  the  broadcast  basis,  each  of  S  —  c  processors  of  V\  in  turn 
broadcasts  the  computed  coordinates  to  all  of  the  processors  of  the  hash-function 
table:  each  processor  of  V2  compares  the  broadcast  location  with  each  of  the  n  —  c 
locations  stored  in  its  local  memory,  updating  a  local  counter  if  necessary.  A 
total  of  (S  —  c)(n  —  c )  comparisons  will  be  required  per  basis  selection.  When 
all  image  features  have  been  exhausted,  a  global-max  operation  on  the  values  of 
the  local  counters  recovers  the  winning  (m,  (A,  i2,  •  •  •  ,  A))  combination.  Compu¬ 
tational  efficiencies  can  again  be  achieved  by  making  use  of  symmetries,  and  will 
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be  examined  in  more  detail  in  chapter  5. 


Languages.  Although  a  number  of  special-purpose  parallel  languages  have  been 
developed  for  the  Connection  Machine,  we  found  that  C  code  running  on  the  host 
enhanced  with  system  calls  to  the  Connection  Machine  using  its  Paris  package  is 
the  most  suitable  for  our  needs.  The  Paris  package  includes  routines  that  give  us 
the  greatest  level  of  control  over  the  machine.  For  the  broadcast-based  algorithm, 
just  about  any  language  would  suffice,  and  the  Paris  primitives  represent  a  fast 
development  path. 

3.8  Implementation  Results  /  Scalability 

Models  (dot  patterns)  of  16  points  each  were  generated  using  either  a  uniform 
distribution  over  a  region,  or  a  Gaussian  distribution.  After  generating  1024  such 
models,  scenes  were  constructed  of  approximately  200  points,  with  a  single  model 
embedded  in  the  scene,  translated,  rotated,  and  scaled.  Noise  was  added  to  the 
scene  points  (through  quantization  round-off  error). 

In  both  implementations,  the  front  end  randomly  selects  a  pair  of  scene  points 
(a  probe)  as  the  basis  to  be  used  for  possible  recognition.  For  the  connectionist 
algorithm,  a  probe  takes  5.05  seconds  on  an  8AT-processor  machine,  dropping  to 
0.80  seconds  on  a  32AT-processor  machine  (see  the  plots  in  Figure  3.10).  The 
employed  hash  table  contained  86.6%  of  the  total  number  of  hash  entries.  It  can 
be  seen  from  the  plots  that  the  connectionist  algorithm  is  operating  in  a  roughly 
linear  regime  -  i.e.,  we  are  achieving  linear  speedup  due  to  the  heavy  loading. 
In  fact,  as  the  number  of  processors  increases,  reduced  contention  in  the  routing 
algorithm  gives  us,  in  some  cases,  an  apparent  extra  boost;  such  improvements 
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Connectionist  algorithm 


Broadcast  algorithm 


Figure  3.10  Average  time  required  for  a  single  basis  probe,  as  a 
function  of  the  number  of  processors  in  the  Connection  Machine.  The 
database  contains  1024  models  of  16  points  each,  and  the  scenes  con¬ 
tain  200  points. 


would  not  continue  forever. 

In  the  broadcast  algorithm,  the  8A’-processor  machine  processes  a  probe  at  a 
rate  of  10  msec  per  scene  point,  i.e.,  approximately  2.0  seconds  for  a  probe  using 
a  two  hundred  point  scene.  The  employed  hash-function  table  contained  100% 
of  the  total  number  of  hash  entries  for  the  given  database.  Experiments  with 
a  16A-  and  32A’-processor  model  indicate  nearly  linear  increases  in  speed  (see 
Figure  3.10),  so  that  a  64A’-processor  machine  should  be  able  to  perform  a  probe 
in  about  300  milliseconds. 

By  way  of  comparison,  both  algorithms  are  easily  coded  on  a  typical  high- 
performance  workstation.  Performance  results  are  highly  dependent  on  disk 
access  rates  and  available  memory,  but  we  have  seen  probe  times  of  roughly 
35  seconds  for  the  equivalent  of  the  hash-location  broadcast  algorithm  on  a  SUN 
Sparcstation-2. 
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Chapter  4 


Distributions  of  Invariants 


In  this  chapter,  we  examine  the  issue  of  index  distribution  over  the  space  of  in¬ 
variants.  In  particular,  we  derive  precise  as  well  as  approximate  formulas  and 
qualitative  results  for  a  number  of  transformation  and  feature  distribution  com¬ 
binations. 

Soon  after  the  conception  of  the  geometric  hashing  technique,  researchers  dis¬ 
covered  that  one  of  the  main  characteristics  of  the  method  was  the  non-uniform 
distribution  of  indices  over  the  space  of  invariants  [21,25,89].  This  non-uniformity, 
which  appears  to  be  an  endemic  of  all  indexing-based  approaches  to  object  recog¬ 
nition,  typically  poses  problems.  A  number  of  heuristics  were  invented  in  order 
to  alleviate  those  problems  [25],  but  the  results  were  not  particularly  promising. 

In  our  study,  we  will  concentrate  on  the  rigid,  similarity  and  affine  transfor¬ 
mations.  We  will  assume  that  the  model  point  features  are  generated  by  either  a 
Gaussian  random  process  of  standard  deviation  cr,  or  a  process  that  is  Uniform 
over  the  unit  disc  or  the  unit  square. 

As  it  will  be  demonstrated  in  chapter  5,  the  knowledge  of  those  distributions 
is  particularly  important;  indeed,  it  allows  for  a  number  of  performance  enhance- 
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ments  in  the  implementations  of  the  algorithms  described  in  chapter  3.  Also, 
the  knowledge  of  those  distributions  proves  instrumental  in  the  development  of  a 
Bayesian  approach  to  model  matching  with  geometric  hashing  (see  chapter  7). 

4.1  Rigid  Transformation 

We  begin  our  study  of  the  distribution  of  indices  over  the  space  of  invariants  with 
the  rigid  transformation:  the  database  models  are  allowed  to  undergo  rotation 
and  translation  only. 

Let  pw ,  p^2 ,  and  p  be  the  position  vectors  of  three  point  features  belonging 
to  model  m.  Then,  the  tuple  (u,v)  satisfying  Eqn.  2.1  is  unique,  and  invariant 
under  rigid  transformations. 

Let  us  further  assume  that  the  point  features  pw ,  p|U2 ,  and  p  are  generated 
by  the  Gaussian  random  process  W(0,(  q  °)),  i.e.  a  two-dimensional  Gaussian 
process  with  mean  value  (0,  0)  and  covariance  matrix  (  q  °).  From  standard  prob¬ 
ability  theory,  we  know  that  the  joint  distribution  f(u}v)  of  u  and  v  is  given  by 
the  expression 

/  f(x(u,v),y(u,v))  f(xlil,ylil)  f(xli2,yli2)  \  J  |-1  dx^dx^dy^dy^,  (4.1) 
J  R4 

where  J  is  the  Jacobian  of  the  transformation.  Evaluation  of  this  integral  yields 
the  following  result  for  the  distribution  of  indices  over  the  space  of  invariants 

(4.2) 

Figure  4.1  shows  the  distribution  over  the  space  of  invariants,  and  several  of  its 
contours. 
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Figure  4.1  The  distribution  over  the  space  of  invariants,  and  several 
of  its  contours  for  the  case  of  point  features  that  are  generated  by  the 
Gaussian  process  A/"(0,(  q  °)).  The  allowed  transformation  is  rigid. 

For  the  case  where  the  point  features  pA(1 ,  pA(9 ,  and  p  are  uniformly  distributed 
over  the  unit  disc,  an  evaluation  of  the  above  integral  proves  too  difficult  to 
obtain  analytically.  Consequently,  non-linear  parameter  fitting  was  exploited. 
We  applied  the  method  of  Levenberg-Marquarclt  [75]  to  synthetically  generated 
data,  and  found  that,  in  this  case,  the  distribution  of  indices  over  the  space  of 
invariants  can  be  approximated  well  by 


The  distribution  of  synthetically  generated  indices  over  the  space  of  invariants, 
and  several  of  its  contours,  are  shown  in  Figure  4.2. 
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Figure  4.2  The  distribution  over  the  space  of  invariants,  and  sev¬ 
eral  of  its  contours  for  the  case  of  point  features  that  are  uniformly 
distributed  over  the  unit  disc.  The  allowed  transformation  is  rigid. 

4.2  Similarity  Transformation 

We  next  examine  the  distribution  of  indices  over  the  space  of  invariants  for  the 
case  of  the' similarity  transformation:  the  models  in  the  database  are  allowed  to 
undergo  rotation,  translation  and  scaling. 

If  pA(1 ,  pA(9 ,  and  p  are  the  position  vectors  of  three  point  features  belonging  to 
model  m,  then  the  tuple  {u,v)  satisfying  Eqn.  2.2  is  unique,  and  invariant  under 
similarity  transformations. 

If  we  further  assume  that  the  point  features  pw,  pA(9,  and  p  are  generated 
by  the  Gaussian  random  process  W(0,  (  q  °)),  then,  evaluation  of  the  integral  in 
expression  4.1  yields 
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It  is  worth  noting  that  the  last  expression  is  independent  of  the  value  of  the 
standard  deviation  of  the,  Gaussian  process  generating  the  point  features.  The 
distribution  over  the  space  of  invariants,  and  several  of  its  contours  are  shown  in 
Figure  4.3. 


A  derivation  of  the  expression  for  the  distribution  of  invariants  in  the  case 
where  the  point  features  are  uniformly  distributed  over  either  the  unit  disc,  or 
the  unit  square  has  proven  intractable. 

4.3  Affine  Transformation 

The  case  where  the  patterns  of  point  features  corresponding  to  the  different  mod¬ 
els  can  undergo  a  general  linear  (affine)  transformation  is  slightly  different.  An 
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affine  transformation  of  any  such  pattern  will  be  uniquely  defined  by  the  trans¬ 
formation  of  three,  instead  of  two,  points. 

Assume  that  pw ,  p^2 ,  and  p^3  are  the  position  vectors  of  three  point  features 
belonging  to  model  m.  Then  the  vectors  p^2  —  pw  and  p^3  —  pw  form  a  skewed 
basis,  and  thus  a  skewed  coordinate  system  Oxy.  Any  other  point  p  of  model  m 
can  be  represented  in  this  basis  as 

p-po  =  u(p^2  -  pw) +  u(pAl3  -pw)  (4.5) 

where  po  =  ap,,,  +  /?pM2  +  yp^  is  the  position  vector  of  the  center  of  the  skewed 
coordinate  system.  The  reason  for  our  expressing  the  coordinates  of  the  latter 
point  as  a  function  of  pw ,  p|U2 ,  and  p^3  will  become  evident  shortly. 

Let  us  assume  that  the  point  features  pw ,  p^2 ,  p^3 ,  and  p  are  generated 
by  the  Gaussian  random  process  A"(0,(  q  °)),  i.e.  a  two-dimensional  Gaussian 
process  with  mean  value  (0,  0)  and  covariance  matrix  (  q  °).  Then  from  standard 
probability  theory,  we  know  that  the  joint  distribution  /  (it,  v)  of  u  and  v  is  given 
by  the  expression 

/  f(x(u,v),y(u,v))  f(xlil,ylil)  f(xli2,yli2)  f  (x m  ,  y U3)  |  J  |_1 

Jr 6 

dx^dx^dx^dy^dy^dy^,  (4.6) 

where  J  is  the  Jacobian  of  the  transformation.  Evaluation  of  the  latter  integral 
yields  the  following  result  for  the  distribution  of  indices  over  the  space  of  invariants 

(4.7) 

where  C  is  a  normalization  constant  that  makes  /(it,  v)  a  probability  density 
function. 


C 

f(u,v )  =  - 3- 

(4m2  +  4m2  +  4mm  +  4  (/?  —  a)  u  +  4  (j  —  a)  v  +  2  (a2  +  f32  +  j2  +  l))  2 
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Several  observations  regarding  the  distribution  over  the  space  of  invariants 
can  be  made.  In  particular,  from  this  last  equation  we  can  see  that  the  contours 
of  the  distribution  are  second  order  curves  of  the  elliptic  type.  These  curves  have 
two  axes  of  symmetry:  one  of  the  axes  is  at  45  degrees  to  the  horizontal  whereas 
the  second  is  perpendicular  to  the  first,  and  this  is  independent  of  the  choice  of 
a,  /3,  and  7. 

In  addition  to  the  axial  symmetry,  the  curves  also  have  a  center  of  symmetry , 
which  is  located  at 


The  location  of  the  center  of  symmetry,  which  happens  to  coincide  with  the 
location  of  the  peak  of  the  distribution,  allows  the  derivation  of  a  natural  con¬ 
straint  on  the  pivot  selection.  If  we  require  that  the  center  of  symmetry  coincide 
with  the  center  of  the  coordinate  system  in  the  space  of  invariants,  i.e.  that 

ucs  =  0  A  vcs  =  0,  (4.9) 


we  conclude  from  Eqn.  4.8  that  it  must  hold  that 


a  =  /3  A  /3  =  7 


(4.10) 


The  choice  for  the  location  of  the  center  of  the  skewed  coordinate  system 
should  be  clear.  By  setting  a  =  [3  =  7  =  1/3,  we  effectively  require  that  the 
center  coincide  with  the  barycenter  of  the  triangle  defined  by  pi,p2,P3,  a  point 
that  is  always  well-defined  since  pi,  p2,  P3  are  assumed  to  be  in  general  position.1 

1 A  similar  result  holds  for  the  case  of  the  rigid  and  similarity  transformations.  It  can  be  easily 
shown  that  the  natural  choice  for  po  in  these  cases  is  the  midpoint  between  the  points  defining 
the  basis  tuple. 
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For  this  selection  of  a,  /i,  and  7,  the  expression  of  the  distribution  of  the  hash 
entries  over  the  hash  table  becomes 


(4.11) 

Note  that,  as  was  the  case  with  the  similarity  transformation,  the  probability 
density  function  is  independent  of  the  standard  deviation  of  the  Gaussian  process 
generating  the  feature  points.  Figure  4.4  shows  the  distribution  over  the  space  of 
invariants,  and  several  of  its  contours. 


Figure  4.4  The  distribution  over  the  space  of  invariants,  and  several 
of  its  contours  for  the  case  of  point  features  that  are  generated  by  the 
Gaussian  process  A/"(0,(  q  °)).  The  allowed  transformation  is  affine. 


We  next  extend  our  analysis  to  the  case  where  the  model  point  features  have  a 
uniform  distribution  over  a  convex  domain,  for  example  the  unit  disc  or  the  unit 
square.  In  principle,  one  could  attempt  to  evaluate  the  integral  in  the  expres¬ 
sion  4.6.  Unfortunately,  the  evaluation  of  this  multiple  integral  was  too  difficult 
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to  obtain  analytically.  Further,  use  of  non-linear  parameter  fitting  did  not  prove 
helpful.  However,  as  we  will  see  below,  a  qualitative  description  of  the  distribution 
is  possible. 

With  regards  to  the  selection  of  the  position  of  the  center  of  the  skewed  coor¬ 
dinate  system,  and  in  spite  of  the  fact  that  the  barycenter  is  a  natural  selection,  it 
is  not  at  all  clear  whether  the  above  analysis  carries  over  to  the  case  of  uniformly 
distributed  model  features.  Consequently,  in  what  follows  we  assume  that  the 
pivot  coincides  with  pi  as  was  originally  described  in  [61]. 


■step 

(0,1) 

W,  6 

>< 

P3  5  ) 

r  n. 

/ 

\  7  /  (£j 

\(1,0) 

(  2 

“  (o,o) 

3  \  '  3' 

—  convex  domain  K 

X.  5’ 

4 

te3) 

Feature  Space 

Space  of  Invariants 

Figure  4.5  Correspondence  between  feature  and  hash  space  regions: 
if  p  lies  in  the  region  of  the  feature  space  marked  i,  the  computed 
invariant  tuple  will  lie  in  the  region  i’  of  the  space  of  invariants. 

Let  pi,p2,  and  p3  be  three  points  of  R2  in  general  position  (Figure  4.5). 
Lines  (ei),(e2),  and  (e3)  divide  the  feature  and  hash  spaces  into  seven  regions: 
if  the  point  p  lies  in  the  region  of  the  feature  space  marked  i  £  {1,  2, .  .  .  ,  7}, 
the  generated  tuple  invariant  will  he  in  the  region  of  the  space  of  invariants 
marked  i' .  Notice  that  if  p  lies  in  any  of  the  odd-numbered  regions  the  quadrangle 
formed  by  the  four  points  will  be  reentrant,  whereas  if  p  lies  in  any  of  the  even- 
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numbered  regions  the  quadrangle  will  be  convex.  In  order  to  qualitatively  describe 
the  distribution  over  the  hash  table,  we  seek  the  probability  that  the  generated 
invariant  tuple  will  he  in  a  given  quadrant  of  the  space  of  invariants.  To  this  end, 
we  will  make  use  of  the  answer  to  a  famous  problem  from  geometric  probability: 
the  so-called  Sylvester’s  Vierpunkt  problem [4].  This  problem  can  be  stated  as 
follows:  “given  a  convex  domain  K,  find  the  probability  that  four  points  taken 
at  random  inside  K  will  form  a  reentrant  quadrilateral.”  As  it  turns  out,  the 
answer  to  this  problem  varies  with  the  shape  of  K;  for  regular  polygons  and 
circles/ellipses  this  probability  is  very  close  to  0.3. 

Let  Pr((7)  (respectively  Pr (R))  denote  the  probability  that  the  quadrilateral 
formed  by  p,Pi,P2  and  p3  is  convex  (respectively  reentrant).  From  Figure  4.5 
and  using  a  symmetry  argument,  we  can  see  that 

Pr(p  in  2)  =  Pr(p  in  4)  =  Pr(p  in  6)  =  |Pr((7) 

Pr(p  in  1)  =  Pr(p  in  3)  =  Pr(p  in  5)  =  Pr(p  in  7)  =  |Pr(i7). 

We  can  then  evaluate  the  desired  probabilities  as  follows: 

Pr(tuple  in  1st  quadrant)  =  Pr(tuple  in  &  or  71)  =  Pr(p  in  6)  +  Pr(p  in  7) 

=  iPr(C)  +  iPr(iJ)  =  l(l-Pr(JJ))  +  IPr(JJ) 

=  |-SprW 

Working  in  a  similar  manner, 

Pr(tuple  in  2nd  quadrant)  =  |  —  A  Pr (R) 

Pr(tuple  in  3rd  quadrant)  =  |Pr(i?) 

Pr(tuple  in  4th  quadrant)  =  |  —  APr(i?). 

The  value  of  Pr(i?)  in  the  above  formulas  is  given  by  the  solution  to  Sylvester’s 
problem  for  the  given  convex  domain  K.  In  the  case  where  K  is  the  unit  disc, 
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Pr(i?)  =  35 / ( 127T2 )  and 


Pr( tuple  in  1st  quadrant)  =  |  —  45  2 
Pr( tuple  in  2nd  quadrant)  =  |  ~  144^2 
Pr( tuple  in  3rd  quadrant)  =  -^2 

Pr( tuple  in  4th  quadrant)  =  f  ~  14^5  2 
In  other  words,  the  first,  second,  and  fourth  quadrants  will  each  contain  the 
same  number  of  hash  entries,  whereas  only  a  very  small  percentage  of  the  entries 
(less  than  8%  of  the  total)  will  reside  in  the  third  quadrant!  Consequently,  the 
resulting  distribution  will  have  only  one  axis  of  symmetry.  This  is  a  rather  un¬ 
expected  result  given  1  lit'  shape  of  the  domain  A  (unit  disc).  If  A  is  a  square, 


Figure  4.6  Several  of  the  contours  of  the  hash  table  distribution. 
The  model  features  are  uniformly  distributed  over  the  unit  disc,  and 
the  allowed  transformation  is  affine. 

the  value  of  Pr (R)  is  equal  to  11/36:  substitution  of  the  latter  value  to  1  lie  above 
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formulas  gives 


Pr(tuple  in  1st  quadrant)  =  ||| 

Pr(tuple  in  2nd  quadrant)  =  ||| 

Pr(tuple  in  3rd  quadrant)  =  Ah 
Pr(tuple  in  4th  quadrant)  =  ||| 

In  other  words,  the  asymmetry  still  persists.  Figure  4.6  shows  several  of  the 
contours  of  the  distribution  over  the  hash  table  for  the  case  where  K  is  the  unit 
disc.  This  distribution  was  obtained  by  means  of  a  Monte  Carlo  simulation.  As 
can  be  seen,  practically  all  of  the  hash  entries  are  located  in  the  first,  second  and 
fourth  quadrants  of  the  hash  space,  as  predicted  by  the  above  analysis. 

The  above  results  also  hold  qualitatively  for  the  case  where  the  pivot  coincides 
with  the  barycenter. 
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Chapter  5 


Parallelism  Revisited 


In  this  chapter,  we  present  a  number  of  enhancements  to  the  geometric  hashing 
method.  In  particular,  hash  table  equalization  and  exploitation  of  symmetries 
are  developed  specifically  for  the  parallel  algorithms.  These  techniques  lead  to 
substantial  performance  improvements  and  are  also  applicable  to  more  general 
implementations  of  indexing-based  object  recognition  methods. 

As  was  indicated  in  Section  3.7,  the  non-uniform  occupancy  of  the  hash  bins 
results  in  different  lengths  for  the  hash  entry  lists.  Since  the  time  needed  to 
traverse  the  hash  entry  lists  during  the  histograming  phase  of  the  algorithm  is 
dominated  by  the  longest  such  list,  a  uniform  distribution  of  the  entries  over  the 
hash  table  is  desirable;  a  uniform  distribution  will  reduce  execution  time  and 
result  in  an  efficient  storage  of  the  hash  table  data  structure. 

In  addition  to  the  efficiencies  gained  by  rehashing,  we  may  independently  make 
use  of  certain  symmetries  in  the  storage  pattern  of  hash  entries  in  the  hash  table: 
exploitation  of  these  symmetries  results  in  further  savings  in  computational  and 
storage  requirements. 
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5.1  Rehashing 


We  next  describe  a  method  to  transform  the  coordinates  of  point  locations  so  that 
the  equispaced  quantizer  in  the  space  of  invariants  yields  an  expected  uniform 
distribution. 

One  must  first  determine  the  expected  probability  density  /(it,  v)  of  the  distri¬ 
bution  of  the  untransformed  coordinates  (tuple  of  invariants).  As  we  described  in 
chapter  4,  this  can  be  done  either  by  fitting  a  parametric  model  to  synthetically 
generated  data,  or  by  calculating  analytically  the  expected  distribution  based 
upon  a  model  for  the  distribution  of  point  features  in  the  plane. 

Once  the  probability  density  /(it,  v)  is  known,  a  transformation  that  maps  the 
original  distribution  to  the  uniform  distribution  over  a  closed  region  (in  particu¬ 
lar,  a  rectangle)  must  be  computed.  This  new  mapping  is  effectively  a  hashing 
function 

h  :  R2  ->  R2, 

and  is  used  to  evenly  distribute  the  bin  entries  over  a  rectangular  hash  table. 
When  the  appropriate  remapping  function  is  applied  to  the  invariants  computed 
using  either  one  of  Eqns.  2.1,  2.2,  or  2.3,  then  the  equally-spaced  hash  bins  in  the 
remapped  space  have  a  uniform  expected  population.  Henceforth,  this  function 
will  be  called  a  rehashing  function. 

Case  of  Rigid  Transformations.  In  Section  4.1  we  derived  formulas  for  the 
distribution  /(it,  v)  of  indices  over  the  space  of  invariants  for  the  case  where  the 
point  features  were  either  generated  by  the  Gaussian  random  process  Af  (0,  (  q  °)), 
or,  uniformly  distributed  over  the  unit  disc. 
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We  begin  with  the  case  of  Gaussian  distributed  point  features.  The  distribu¬ 
tion  of  indices  over  the  space  of  invariants  was  shown  to  be 


f(u,v)  = 


1  u2  +  V 

-L  r»  9 


3  7r  a2 


3cr2 


The  expression  can  be  rewritten  in  polar  coordinates  (p,  6)  as  follows 

1 

f(p,0)=o - 2  e  3(j2’  P  G  t0’00  )’  0  G  [0’27r  )• 

3  7r  az 


Observing  that 


fPo  r2ir 


JO  JO 


P  f(p,0)  dOdp  = 


P2o 

1  -  e  3cr2 


V 


we  conclude  that  the  rehashing  function  is  given  by 


u2  +  v 2 


h(u,u)=  1  — e  3a2  5  atan2  (v,u) 


(5.1) 


The  function  atan2(-,  •)  converts  rectangular  coordinates  to  polar  coordinates  by 
returning  the  phase  in  the  interval  [ — 7r,  7t] .  Figure  5.1  shows  the  result  of  hash 
table  equalization  for  the  case  of  rigid  transformations,  and  several  of  the  cor¬ 
responding  contours.  The  point  features  were  generated  synthetically  by  the 
Gaussian  process  W(0,(  q  °)). 

When  the  model  point  features  are  uniformly  distributed  over  the  unit  disc, 
the  distribution  of  indices  over  the  space  of  invariants  can  be  well-approximated 
by  the  function 

“  ((4.7u2  +  3.9u2)2  +  36.7)2' 

Repeating  the  above  analysis,  we  can  show  that  the  rehashing  function  in  this 
case  is 


h(u,  v)  = 


2  atan  [  (auf+ibv) 
tt  atan  c2 


+ 


V 


(au)2 +(bvy 


\ 


7T  f  (au^+^bvy  +c 4 


,  atan2  (t>,  u) 


(5.2) 


) 
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Figure  5.1  Hash  table  equalization  for  the  case  of  rigid  trans¬ 
formations  and  point  features  generated  by  the  Gaussian  process 
A/”(0,  (  o  °)b  Left:  the  expected  distribution  of  remapped  invariants. 
Right:  several  of  the  distribution’s  contours. 


where  a  =  4.7,  b  =  3.9,  c  =  ( 36.7 )1//4,  and  atan(-,  •)  returns  the  arctangent  of  its 
argument  in  the  interval  [— tt/2,  tt/2] .  In  Figure  5.2  we  show  the  result  of  hash 
table  equalization,  and  several  of  the  corresponding  contours.  As  can  be  seen,  the 
remapping  is  very  efficient. 


Case  of  Similarity  Transformations.  For  the  case  of  the  similarity  trans¬ 
formation,  we  have  determined  the  distribution  of  indices  only  when  the  point 
features  are  generated  by  a  Gaussian  process  A/"(0,  (  q  °)).  The  expression  for 
that  distribution  was  shown  to  be  independent  of  a,  and  equal  to 

12  1 

(4  (i<2  +  v2)  +  3  )2 
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Figure  5.2  Hash  table  equalization  for  the  case  of  rigid  transforma¬ 
tions  and  point  features  uniformly  distributed  over  the  unit  disc.  Left: 
the  expected  distribution  of  remapped  invariants.  Right:  several  of 
the  distribution’s  contours. 

An  analysis  similar  to  the  one  carried  out  for  the  case  of  the  rigid  transformations 
allows  us  to  derive  the  following  rehashing  function 

(5. 

Use  of  this  rehashing  function  allows  the  remapping  of  the  computed  invariants 
and  results  in  the  distribution  shown  in  Figure  5.3.  Again,  the  remapping  is  very 
efficient . 

Case  of  Affine  Transformations.  We  finally  repeat  the  above  analysis  for 
the  case  of  affine  transformations  where  the  point  features  are  generated  by  a 
Gaussian  process  W(0,  (  q  °  )).  As  was  determined  in  Section  4.3,  the  distribution 
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Figure  5.3  Hash  table  equalization  for  the  case  of  similarity  trans¬ 
formations  and  point  features  generated  by  the  Gaussian  process 
A/”(0,  (  o  °)b  Left:  the  expected  distribution  of  remapped  invariants. 
Right:  several  of  the  distribution’s  contours. 


of  indices  in  the  space  of  invariants  is  given  by 


.f{u,v) 


2^2  1 

n  (An2  +  4  c2  +  Auv  +  8/3)2 


The  appropriate  rehashing  function  for  this  case  can  be  shown  to  be 


In  Figure  5.4,  we  show  the  result  of  hash  table  equalization,  and  several  of  the 


corresponding  contours.  As  can  be  seen,  the  remapping  is  very  efficient. 


5.2  Symmetries  and  Foldings 


In  the  previous  section,  we  described  the  use  of  rehashing  functions  which  results 
in  a  decrease  of  computational  requirements.  Additional  savings  in  both  compu- 


Figure  5.4  Hash  table  equalization  for  the  case  of  affine  trans¬ 
formations  and  point  features  generated  by  the  Gaussian  process 
A/”(0,  (  o  °)b  Left:  the  expected  distribution  of  remapped  invariants. 
Right:  several  of  the  distribution’s  contours. 

tational  and  storage  requirements  are  also  possible.  Certain  symmetries  exist  in 
the  storage  pattern  of  entries  in  the  hash  table;  these  symmetries  are  independent 
of  the  use  of  rehashing  functions  given  in  section  5.1. 

Let  pA(1  and  pA(9  be  a  basis  pair  comprising  point  features  that  belong  to  model 
m,  and  let  {u,v)  be  the  coordinates  of  point  p  of  m  in  the  coordinate  system 
defined  by  pA(1,  pA(9.  Then,  there  will  be  an  entry  of  the  form  (m,  (/q,  //2))  at 
location  (u,  v)  of  the  hash  table  (or  at  the  rehashed  position  h(u,  v)).  If  the  tuple 
(pA(9,  pA(1 )  is  used  instead  to  form  a  basis  pair,  and  thus  a  coordinate  system,  we 
observe  that  the  coordinates  of  p  will  be  {—u,  —v).  I.e.,  at  the  location  {—u,  —v) 
of  the  hash  table  there  will  be  an  entry  of  the  form  (m,  (//2,  /q  ))•  This  will  hold 
true  for  both  the  rigid  and  similarity  transformations.  Due  to  the  symmetry 
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of  the  rehashing  functions  (see  Eqns.  5. 1-5.3),  the  rehashed  points  h(it,i;)  and 
h(— it,  —v)  will  still  be  related;  in  particular,  they  will  have  the  same  abscissa  but 
they  will  be  a  distance  7 r  apart  in  the  polar  coordinates.  Figure  5.5  details  the 
above  observations  for  the  special  case  of  model  Mi,  and  the  basis  tuples  (p4,  pi), 
(Pl,P4)- 

A  similar  observation  can  be  made  for  the  case  of  the  affine  transformation  as 
well.  Let  p,  pw  p|U2  and  p^3  be  an  arrangement  of  four  points  belonging  to  model 
m.  There  are  a  total  of  4!  distinct  affine  bases  that  one  can  form  using  points 
from  this  point  set.  Two  such  bases  are  (pw ,  p^2 ,  p^3)  and  (pw ,  p^3 ,  p^2).  From 
Eqn.  2.3  we  can  see  that  the  coordinates  of  p  in  the  coordinate  systems  defined 
by  the  two  (skewed)  bases  will  be  (u,v)  and  (v}u)  respectively.  In  other  words, 
the  two  hash  entries  corresponding  to  the  two  basis  tuples  will  be  symmetric 
with  respect  to  the  line  that  is  at  a  45  degree  angle  with  the  horizontal  axis.  At 
location  (it,  v)  of  the  hash  table  there  will  be  an  entry  of  the  form  (m,  (/ii,  /x2 ,  /i 3)), 
whereas  at  location  (i>,  u)  there  will  be  an  entry  of  the  form  (m,  (/ii,  /x3 ,  /x2) ) -  As  a 
result  of  the  symmetry  of  the  rehashing  function  in  Eqn.  5.4,  the  rehashed  points 
h(it,i;)  and  h(i;,it)  will  still  be  related;  in  particular,  they  will  have  the  same 
abscissa,  and  opposite  sign  6  values.  In  Figure  5.6,  we  graphically  depict  the 
above  observations  for  the  case  of  model  M4,  and  the  basis  tuples  (p4,Ps,Pi), 
(P4,Pi,Ps)- 

The  result  of  these  symmetries  is  that  for  every  entry  in  a  certain  half  of  the 
hash  table,  there  is  an  equivalent  entry  in  the  other  half,  with  the  only  change 
that  the  basis  tuple  (or  part  of  it)  is  reversed.  Thus,  we  can  dispose  of  half  the 
hash  table:  during  the  recognition  phase,  when  a  hash  occurs  to  the  missing  half 
of  the  table,  the  corresponding  entry  can  be  generated  from  the  stored  one,  at  the 
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*  (p,0)  =  h  (u,v) 

1 

—71 

• 

(p,0— 7t)  =  h  (-u,-v) 

Figure  5.5  Symmetries  in  the  storage  pattern  of  the  hash  entries. 
Top:  if  no  rehashing  is  used,  the  hash  entries  are  symmetric  with  re¬ 
spect  to  the  center  of  the  coordinate  system  of  the  space  of  invariants. 
Bottom:  when  rehashing  is  used,  the  rehashed  entries  have  the  same 
abscissa  but  are  distance  n  apart.  These  observations  hold  true  for 
both  the  rigid  and  similarity  transformations. 
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(0,0) 


Figure  5.6  Symmetries  in  the  storage  pattern  of  the  hash  entries 
under  the  affine  transformation.  Top:  if  no  rehashing  is  used,  the 
hash  entries  are  symmetric  with  respect  to  the  line  that  is  at  a  45 
degree  angle  with  the  horizontal  axis.  Bottom:  when  rehashing  is 
used,  the  rehashed  entries  have  the  same  abscissa  and  6  values  of 
opposite  signs. 
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expense  of  minimal  bookkeeping.  Accordingly,  only  half  the  hash  table  will  need 
to  be  stored,  and  thus  the  entry  lists  become,  on  the  average,  half  as  long,  when 
spread  among  the  existing  set  of  processors.  Consequently,  a  speedup  of  two,  on 
the  average,  can  be  expected.  It  is  interesting  to  note  that  this  speedup  occurs 
because  of  decreased  storage  requirements,  and  is  not  due  to  the  doubling  of  the 
computational  capacity. 

Clearly,  the  computational  savings  accrued  by  the  use  of  rehashing  are  irrel¬ 
evant  for  the  case  of  the  broadcast  algorithm.  However,  by  making  use  of  the 
symmetries  in  the  storage  pattern  of  the  hash  entries,  the  storage  requirements 
for  the  broadcast  algorithm  can  be  reduced  by  a  factor  of  two,  leading  to  a  cor¬ 
responding  reduction  of  the  VPR  value,  and  thus  a  speedup. 

A  further  possibility  is  to  perform  a  folding  of  the  space  of  invariants.  For 
example,  when  a  hash  occurs  to  the  missing  half-plane,  rather  than  generating 
the  entries  from  the  list  of  entries  in  the  associated  position  of  the  other  half-plane, 
we  can  instead  register  a  vote  for  the  entire  hash  bin  in  the  existing  half-plane. 
In  the  case  of  similarity  transformations,  for  example,  this  operation  will  in  effect 
confuse  entries  of  the  form  (m,  (// i,/i2))  with  entries  of  the  form  (m,(/i2,/i i)): 
thus,  the  basis  tuples  are  now  basis  sets,  and  (pMl,  p^2)  is  the  same  as  (p^2,  pMl). 
Although  this  means  that  a  particular  (model,  basis )  may  receive  more  votes  than 
it  actually  deserves,  we  have  encountered  no  difficulties  with  this  method. 

5.3  Timing  Results 

In  order  to  demonstrate  the  validity  of  our  ideas,  we  tested  the  connectionist 
algorithm  using  synthetically  generated  models  of  16  point  features;  the  point 
features  were  generated  by  a  Gaussian  process.  We  only  demonstrate  the  per- 
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formance  enhancements  of  rehashing.  The  transformation  class  was  the  set  of 
similarity  transformations.  During  the  building  of  the  hash  table,  the  rehash¬ 
ing  function  of  Eqn.  5.3  was  used.  After  generating  1024  such  models,  scenes 
were  constructed  of  approximately  200  points,  with  a  single  model  embedded  in 
the  scene,  translated,  rotated,  and  scaled.  Noise  was  added  to  the  scene  points 
(through  quantization  round-off  error). 


Connectionist  algorithm 

Figure  5.7  Average  time  that  the  connectionist  algorithm  requires 
for  a  single  basis  probe,  as  a  function  of  the  number  of  processors  in 
the  Connection  Machine.  The  database  contains  1024  models  of  16 
points  each;  the  points  have  been  generated  by  a  Gaussian  process 
and  the  scenes  contain  200  points.  The  allowed  transformation  is 
similarity  and  rehashing  is  used. 


The  front  end  randomly  selected  a  pair  of  scene  points  (a  probe)  as  the  basis 
to  be  used  for  possible  recognition.  For  the  connectionist  algorithm,  a  probe 
takes  1.52  seconds  on  an  SA’-processor  machine,  dropping  to  0.24  seconds  on  a 
32A’-processor  machine  (see  the  plot  in  Figure  5.7),  making  use  of  rehashing  but 
not  using  the  symmetries.  If  the  symmetries  were  used,  then  the  time,  needed 
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for  a  probe  would  drop  accordingly.  Comparing  these  timing  results  to  those  of 
Figure  3.10,  we  can  see  how  beneficial  the  performance  enhancements  are. 
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Chapter  6 


Noise  Modeling 


In  this  chapter  we  introduce  a  new  framework  for  model  based  object  recognition 
in  the  context  of  geometric  hashing.  We  incorporate  additive  Gaussian  noise 
into  our  model  and  analytically  determine  its  effect  on  the  invariants  that  are 
computed  for  the  case  where  the  models  are  allowed  to  undergo  similarity  or 
affine  transformations. 

Usually,  a  number  of  drawbacks  are  associated  with  indexing  techniques: 
namely,  high  sensitivity  to  sensor  noise  and  to  quantization  of  the  space  of  the  in¬ 
variants  (the  “hash  space”),  poor  index  selectivity,  and  non-uniform  accumulation 
of  votes  in  the  different  bins.  Solutions  to  the  problem  of  non-uniform  accumu¬ 
lation  of  votes,  in  the  context  of  geometric  hashing,  were  described  for  certain 
combinations  of  transformations  and  model  point  distributions  in  chapter  5.  For 
noise  tolerance,  some  preliminary  results  (see  [36])  indicated  that  for  a  particular 
type  of  indexing,  small  amounts  of  sensor  noise  may  lead  to  a  performance  degra¬ 
dation.  Things  are  complicated  by  the  fact  that  the  index  selectivity  and  the 
quantization  coarseness  of  the  space  of  invariants  are  interrelated.  For  example, 
if  noise  tolerance  is  important,  the  space  of  invariants  is  coarsely  quantized,  at  the 
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expense  of  reduced  index  selectivity.  If  on  the  other  hand,  higher  index  selectivity 
is  desired,  the  hash  bins  are  made  smaller,  resulting  in  smaller  noise  tolerance. 
A  common  approach  to  the  resolution  of  these  competing  demands  is  the  use 
of  higher-dimensional  indices.  Due  to  the  fact  that  the  minimum  probability  of 
error  decreases  as  the  dimensionality  of  the  index  increases  [27],  high-dimensional 
indexing  results  in  more  descriptive  power  (higher  selectivity)  and  higher  noise 
tolerance.  However,  the  gains  from  this  dimensionality  increase  cannot  continue 
ad  infinitum.  Additionally,  higher-dimensional  indexing  is  not  supported  by  phys¬ 
iological  evidence  [46,55]. 

The  number  of  dimensions  that  are  needed  is  typically  task  dependent  and, 
either  it  is  decided  upon  in  advance  [21,24],  or  it  depends  on  the  complexity  of  the 
objects  stored  in  the  database  [89].  To  our  knowledge,  no  current  system  makes 
use  of  perceptual  groupings  as  suggested  by  Lowe  [68]. 

Recently,  a  number  of  researchers  have  attempted  to  build  more  robust  object 
recognition  systems  by  taking  into  consideration  the  effect  of  sensor  noise.  These 
efforts  have  met  with  moderate  success.  In  particular,  under  the  assumption  of 
a  bounded  amount  of  noise  for  the  sensing  device,  they  derive  either  positional 
bounds  [47,49,51],  or  bounds  in  the  space  of  allowed  transformations  [18,48].  In 
all  these  systems,  the  databases  contain  only  one  or  two  models  and  the  signal-to- 
noise  ratio  for  the  scene  features  (i.e.  the  ratio  of  points  belonging  to  the  model 
over  the  number  of  remaining  scene  points)  is  very  small,  typically  one.  Although 
bounds  are  appealing  because  of  the  tractability  of  the  computations  and  the  ease 
with  which  they  can  be  applied,  we  conjecture  that  they  will  be  of  little  or  no 
help  in  the  case  of  cluttered  scenes  or  large  databases;  the  reason  is  that  they 
treat  all  the  points  of  the  bounded  region  equiprobably.  An  approach  where  the 
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sensing  device  is  assumed  to  introduce  Gaussian  additive  noise  to  the  positions 
of  the  scene  features  should  lead  to  improved  results. 

6.1  Performance  in  the  Presence  of  Noise 

In  this  section,  we  examine  the  performance  of  the  geometric  hashing  approach  in 
the  presence  of  noise  and  discuss  some  suggestions  that  have  been  made  in  order 
to  improve  the  performance. 

We  first  examine  the  power  of  the  geometric  hashing  method  as  a  filtering 
method.  In  particular,  we  have  designed  and  carried  out  a  series  of  experiments 
that  determine  the  percentage  of  model/basis  combinations  which  for  a  given 
probe  receive  exactly  k  votes,  k  =  l,2,3,...,n;  n  is  the  number  of  points  in 
the  model  [79].  These  experiments  cover  the  cases  of  rigid  and  similarity  trans¬ 
formations  and  were  performed  using  large  databases  (512  models)  and  a  large 
number  (40)  of  test  scenes.  Similar  experiments  were  reported  in  [60];  however, 
the  databases  used  there  were  very  small  (no  more  than  20  objects),  and  the 
test  scenes  contained  very  little  clutter  (roughly  as  many  clutter  points  as  model 
points).  In  our  experiments,  there  was  only  one  model  embedded  in  the  scene. 
All  of  our  database  models  consist  of  16  point  features.  Our  test  scenes  contain  a 
total  of  200  points  (i.e.  the  signal-to-noise  ratio  for  the  scene  features  was  almost 
1/12). 

Figure  6.1  graphically  shows  the  results  of  some  of  the  experiments,  for  the 
case  where  the  database  models  are  allowed  to  undergo  similarity  (i.e.  rotation, 
translation  and  scaling)  transformations.  As  can  be  seen  from  these  graphs,  the 
filtering  power  of  the  method  is  satisfactory:  indeed,  less  than  2%  of  all  possible 
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Figure  6.1  Similarity  Transforms:  the  expected  percentage  of 

model/basis  combinations  receiving  exactly  k  votes.  Top:  the  mod¬ 
els’  feature  points  are  distributed  according  to  a  Gaussian  of  a=l. 
Bottom:  the  models’  feature  points  are  distributed  uniformly  over 
the  unit  disc.  In  both  cases,  the  database  contained  512  models,  each 
consisting  of  16  points. 
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Figure  6.2  Percentage  of  the  embedded  model’s  bases  receiving  k 
votes  when  used  as  probes,  for  different  amounts  of  Gaussian  noise. 
The  models  can  only  undergo  similarity  transformations.  Top:  the 
models  points  are  distributed  according  to  a  Gaussian  of  a=l.  Bot¬ 
tom:  the  models  points  are  distributed  uniformly  over  the  unit  disc. 
In  both  cases,  the  database  contained  512  models,  each  consisting  of 
16  points. 
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model/basis  combinations  receive  more  than  9  votes.  Figure  6.2,  on  the  other 
hand,  shows  the  degradation  of  the  method’s  performance  as  a  function  of  the 
noise.  Observe  that  little  noise  in  the  input  suffices  to  render  the  model  that  is 
embedded  in  the  scene  practically  undetectable.  The  reason  is  that  the  existence 
of  noise  in  the  input  leads  to  positional  errors,  which  in  turn  translate  to  errors 
in  the  invariants.  If  the  error  in  the  input  is  “small,”  the  computed  invariant, 
after  proper  quantization,  will  generate  the  same  index  as  the  noise-free  case  thus 
hashing  into  the  correct  hash  table  bin.  Clearly,  the  semantics  of  “small”  in 
this  context  directly  depends  on  the  coarseness  of  quantization  of  the  space  of 
invariants.  Once  this  coarseness  is  decided,  an  associated  degree  of  tolerance  is 
implicitly  built  into  the  hash  table.  One  can  easily  envision  cases  where  a  given 
hash  table  has  insufficient  power  to  discriminate  among  the  stored  models,  for 
certain  choices  of  features  in  the  scene:  the  hash  table  is  too  coarsely  quantized  for 
these  models;  consequently,  the  hash  table  will  yield  too  many  candidate  matches 
during  the  recognition  phase  and  one  must  revert  to  a  situation  of  testing  and 
verifying  many  generated  hypotheses. 

Clearly,  if  a  system  is  to  be  robust  in  a  real  setting,  one  should  be  able  to  cope 
with  the  problems  caused  by  the  positional  uncertainty  of  the  scene  features. 

In  [60],  it  was  suggested  that  during  the  selection  process  a  region  of  the  hash 
table  (range  of  hash  table  bins)  be  accessed  instead  of  a  single  bin.  This  region 
might  have  a  rectangular  shape  and  should  be  centered  at  the  hash  bin  which  is 
derived  by  the  computed  invariants.  The  same  approach  was  suggested  for  both 
the  similarity  and  the  affine  transformation  cases. 

Most  geometric  hashing  systems  to  date  have  used  a  quantized  space  of  invari¬ 
ants,  so  that  entries  fall  in  bins.  Whenever  a  hash  location  is  computed,  then  all 
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entries  in  the  bin  receive  votes.  Alternatively,  we  may  assume  that  the  space  of 
invariants  is  quantized  into  tiny  bins,  and  that  a  hash  to  a  bin  invokes  votes  for  all 
entries  in  all  bins  in  a  circular  region  about  the  central  bin.  This  is  the  approach 
used  by  Gavrila  and  Groen  [34]  and  is  implicit  in  the  analysis  of  Huttenlocher  and 
Grimson  [36].  Specifically,  this  method  approximates  a  weighted- voting  function 
that  assigns  a  unit  vote  for  all  entries  located  in  a  disk  or  rectangle  centered  at 
the  hash  location,  and  zero  votes  outside  the  disk  or  rectangle.  Costa  el  al.  [25] 
have  suggested,  and  we  have  used  in  our  work,  a  weighted- voting  function  in  the 
space  of  invariants  that  tapers  down  to  zero  in  a  region  about  the  hash  location, 
in  a  way  that  depends  on  the  hash  location  and  the  basis  selection  in  the  scene. 
Thus  if  (u,v)  is  the  hash  location,  and  (u\vr)  is  the  location  of  a  nearby  entry, 
then  the  later  entry  will  receive  a  weighted  vote  of  w  (u}v ,u',v').  The  function 
w(-}  •,  •,  •)  may  also  depend  on  the  scene  basis  that  has  been  selected  as  a  basis 
probe. 

It  is  anticipated  that  voting  for  regions  of  the  hash  table  instead  of  individual 
hash  table  bins  will  increase  the  degree  of  cross-talk  among  the  different  models. 
Indeed,  a  larger  number  of  distinct  model/basis  combinations  is  expected  to  re¬ 
ceive  votes.  Consequently,  the  verification  step  of  the  recognition  phase  will  be 
more  expensive.  One  might  also  observe  an  increased  probability  of  false  matches, 
although  this  is  not  likely  an  issue  for  small  databases.  As  we  will  see  in  chap¬ 
ters  7  and  8,  combining  the  idea  of  voting  for  regions  with  the  idea  of  weighted 
voting  [42]  considerably  improves  the  results.  A  first  attempt  at  using  weighted 
voting  was  described  in  [25].  The  reported  results  were  not  very  encouraging 
despite  the  fact  that  the  database  contained  only  one  model. 

Although  the  above  ideas  are  reasonable,  they  do  not  capture  the  nature  of  the 


102 


Figure  6.3  Regions  of  the  hash  table  that  would  need  to  be  accessed 
in  the  case  of  Gaussian  error  in  the  positions  of  the  point  features. 
The  models  are  allowed  to  undergo  a  similarity  transformation.  The 
left  graph  of  each  pair  shows  the  feature  space  domain,  whereas  the 
right  shows  the  space  of  invariants.  For  presentation  purposes,  the 
amount  of  Gaussian  error  was  deliberately  large. 
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Figure  6.4  Regions  of  the  hash  table  that  would  need  to  be  accessed 
in  the  case  of  Gaussian  error  in  the  positions  of  the  point  features. 
The  transformation  class  is  affine  transformations.  The  left  graph  of 
each  pair  shows  the  feature  space  domain,  whereas  the  right  shows 
the  space  of  invariants.  For  presentation  purposes,  the  amount  of 
Gaussian  error  was  deliberately  large. 
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problem.  In  particular,  the  size,  shape  and  orientation  of  the  regions  that  need 
to  be  accessed  directly  depend  on  the  selected  basis  tuple,  as  well  as  on  the  com¬ 
puted  hash  locations.  Figures  6.3  and  6.4  show  this  dependence  for  certain  point 
configurations.  As  can  be  seen,  the  region  variations  are  much  more  pronounced 
in  the  affine  transformation  case.  These  two  figures  provide  a  sound  argument 
against  the  straightforward  use  of  either  Manhattan  or  Euclidean  distances  when 
determining  the  size  of  the  regions  that  need  to  be  accessed.  Clearly,  an  adaptive 
scheme  is  needed,  and  the  first  step  towards  creating  a  working  system  that  can 
perform  satisfactorily  in  the  presence  of  noise  is  the  derivation  of  formulas  quan¬ 
tifying  the  observed  behavior.  As  we  will  show  next,  such  formulas  are  easy  to 
derive,  and,  furthermore,  they  are  compatible  with  a  Bayesian  interpretation  of 
geometric  hashing. 

6.2  Modeling  Positional  Noise 

In  this  section  we  present  a  model  for  sensor  noise  and  describe  the  effect  of  posi¬ 
tional  error  on  the  values  of  the  computed  hash  locations.  We  concentrate  on  the 
cases  where  the  models  are  allowed  to  undergo  similarity  or  affine  transformations. 

Traditionally,  sensor  noise  has  been  modeled  as  additive  Gaussian  perturba¬ 
tions.  The  perturbations  are  assumed  to  be  statistically  independent  and  dis¬ 
tributed  according  to  a  Gaussian  distribution  of  standard  deviation  cr,  centered 
at  the  “true”  value  of  the  variable.  This  is  the  noise  model  we  will  assume  in  our 
analysis  of  the  dependence  of  the  computed  hash  locations  on  noise  in  the  input. 

Case:  Similarity  Transformation.  Let  us  consider  a  scene  that  contains  S 
feature  points.  Let  (x8,y8)  be  the  “true”  location  of  the  z-th  feature  point  in 
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the  scene.  Let  also  ( XpLj )  be  the  continuous  random  variables  denoting  the 
coordinates  of  the  z-th  feature  as  these  are  measured  by  the  sensing  device  (in 
this  case  a  camera).  The  joint  probability  density  function  (pdf)  of  X;  and  Lj-  is 
then  given  by: 


1 

27T<72 


exp( 


(Xt  -  Xi )2  +  (Yi 
2a 2 


1,2,. ..,5 


where  we  have  assumed  that  the  standard  deviation  value,  a,  applies  to  both  Xi 
and  Yi,  for  all  the  values  of  z. 

Let  (xi,yi)  and  (a?2 ,  J/2 )  define  an  ordered  basis  (and  thus  a  coordinate  frame 
Oxy ).  Let  also  (x,y)  be  a  third  feature  point  whose  coordinates  (u,v)  in  the 
frame  Oxy  we  wish  to  compute.  Solving  Eqn.  2.2  for  it,  v  yields 


(P  ~  Po)(P2  ~  Pi)* 

II  P2  -  Pi  IP 
(P  ~  Po)(P2  ~  Pl)±{ 
I  (P2  -Pl)X  ||2 


(6.1) 

(6.2) 


or,  assuming  that  U  and  V  are  the  random  variables  denoting  coordinates  in  the 
space  of  invariants, 


(X  -X0,Y-  y0)(x2  -xuy2-  Y.y 
II  (X2-X1,Y2-Y1))  II2 
(X  —  X0,  Y  —  Y0)  (X2  -X1,Y2-  Y.'f 

||  (x2-x1,y2-y1)-L  ||2 


(6.3) 

(6.4) 


From  probability  theory,  the  joint  pdf  of  U  and  V  can  be  expressed  as 


f(U,V)=f  f(X(U,V),Y(U,V))f(X1,Y1)f(X2,Y2)\J\-1dX1dX2dY1dY2  (6.5) 
J  R4 

where  |  J  |_1  =  ||  (X2  —  Xi,F2  —  Y\)  ||2.  Since  the  pdf’s  f(X,Y)  /(Xi,lh)  and 
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f(X2,Y2)  are  known,  f(U,V )  can  be  computed  from  Eqn.  6.5  yielding: 


mv)  = 


27^(4  11  (U,V)  ||2  +3)a 


2  II  p2  —  Pi 


4(£7,V)(i/,i;)* +  3 


+  (4(17,l/)(u,i;)±t)2)  +  24a2(4||  (U,V)  ||2 +3) 


.  (6.6) 


P2  -  Pi 

_2 


1!  II  (U,V)-(u,v)  II2 
4  II  (U,  V)  ||2  +3 


In  Appendix  A,  we  give  more  details  on  the  derivation  of  this  formula. 

Given  a  Gaussian  perturbation  of  the  three  feature  positions,  Eqn.  6.6  de¬ 
scribes  the  “spread”  over  the  space  of  invariants  of  the  computed  coordinates. 
This  formula  substantiates  the  observations  based  on  experiments  that  the  shape 
and  the  size  of  the  spread  depend  on  the  basis  pair  as  well  as  on  the  value  of  the 
computed  hash  location. 

The  above  formula  is  too  complicated  to  be  practical.  For  all  practical  pur¬ 
poses,  the  second  and  higher-order  components  of  the  expression  in  Eqn.  6.6  are 
not  of  any  importance.  It  would  thus  be  beneficial  to  derive  a  first  order  approx¬ 
imation  for  the  spread  over  the  space  of  invariants.  To  this  end,  we  observe  that 
the  Eqns.  6.1  and  6.2  form  the  solution  to  the  matrix  equation: 


-V2  +  y  i 


_  y i+?/2 

y  2 


Let  us  now  introduce  Gaussian  perturbation  in  the  positions  of  the  three  points. 
Then,  Eqn.  6.7  can  be  rewritten  as  follows: 


(A  +  {6  A))  x  =  b  +  (6b)  =>A(I  +  A  (6  A))  x  =  b  +  (6b) 
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x=  (i  +  A-1  (6A))  1A“1(b  +  (5b))  =y- x  ~  (/- A-1(5A))A“1(b  +  (5b))  => 


x  w  A”1  b  +  A-1  (5b)  -  A”1  (6 A)  A-1b 


(6.8) 


where  we  have  ignored  second  and  higher  order  perturbation  terms.  We  see  that 
the  Gaussian  noise  induces  a  perturbation  5x  to  the  correct  solution  x  =  (u,v) 
that  is  equal  to: 

5x  =  A-1  (5b)  -  A-1  (5A)  x  . 


If  (X,  Y),  (X,-,  Y,)^!^,  (U,  V),  are  stochastic  variables  denoting  the  perturbations 
of  p,  { p? }«=i, 2  and  (u,v)  respectively,  then  the  last  equation  can  be  rewritten  as: 


-ku(X2  -  Xi) 

+kv( Y2-Y1) 

-k/2(X1  +  X2) 

+kX 

-lu(  Y2-Y1) 

-lv(X2  -  Xi) 

-1/ 2(Y1  + Y2) 

+/Y 

-mu(X2  -  Xi) 

+mv( Y2  -  Yi) 

—mf  2(Xi  -\-  X2) 

+mX 

-nu(Y2  -  Yx) 

—nv(X2  -  Xi) 

— ra/2(Y !  +  Y2) 

+raY 

where  we  have  substituted  A  1  by  (  Q  and  k,l,m,n  £  R.  Analogously  to 
Eqn.  6.6,  we  have  that  the  joint  pdf,  /( U,  V),  of  U  and  V  is  equal  to: 

/( U,V)=  [  /(X(U,V),Y(U,V))/(X1,Y1)/(X2,Y2)  I  kn-ml  I”1  dXtdX2d\ tdY 2 . 
J  R4 

Since  U,  V  are  linear  combinations  of  normally  distributed  stochastic  variables, 
the  joint  distribution  of  U,  V  will  again  be  normal.  Indeed,  evaluation  of  the 
above  integral  yields  for  the  probability  density  of  the  perturbation  of  x 


(6.9) 
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In  other  words,  if  we  ignore  second  and  higher-order  terms,  additive  Gaussian 
positional  error  results  in  computed  values  for  the  similarity  invariants  that  are 
also  Gaussian  distributed  around  their  “true”  value  with  covariance  matrix  Es.  A 
study  of  the  last  equation  shows  that  it  indeed  incorporates  the  first  order  terms 
from  Eqn.  6.6. 

As  can  be  seen  from  the  expression  for  the  covariance  matrix  Es,  the  larger  the 
separation  of  the  two  basis  points  the  smaller  the  spread  in  the  space  of  invariants; 
this  is  a  long-honored  observation.  Eqn.  6.9  equation  also  introduces  a  new  result. 
Indeed,  for  a  given  basis  separation,  the  distance  of  the  point  whose  coordinates 
we  compute  in  the  coordinate  frame  of  the  basis  also  affects  the  spread:  the 
smaller  this  point’s  distance  from  the  center  of  the  coordinate  frame,  the  smaller 
the  spread  in  the  space  of  invariants  will  be.  The  exact  dependence  is  described 
by  Eqn.  6.9.  In  chapter  4,  we  showed  that  the  distribution  of  the  entries  over  the 
space  of  invariants  is  non-uniform:  hash  indices  near  the  center  of  the  hash  table 
are  more  frequent  than  indices  further  away.  Given  the  result  of  Eqn.  6.9,  we  can 
see  that  there  exists  a  trade-off  between  the  indexing  power  of  an  invariant  tuple 
and  its  sensitivity  to  noise.  Although  index  values  corresponding  to  relatively 
unpopulated  regions  of  the  space  of  invariants  carry  more  information,  they  are 
very  sensitive  to  noise.  The  opposite  is  also  true:  indices  hashing  near  the  peak 
of  the  hash  table  distribution  (very  populated  area)  are  less  informative  but  also 
more  tolerant  to  noise.  Recapitulating,  we  see  that  it  is  the  entire  triplet  which 
determines  the  size  of  the  spread  and  not  only  the  basis  pair.  In  other  words, 
there  is  a  distinct  covariance  matrix  for  every  ordered  triplet  one  can  form  using 
model  points. 
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Case:  Affine  Transformation.  The  above  analysis  can  be  repeated  for  the 
case  of  the  affine  transformation.  Let  (xi,yi),  (a?2 ,  J/2 ) ,  and  (x3}y3)  define  an 
ordered  triplet  and  thus  a  coordinate  frame  Oxy.  Let  also  (x,y)  be  a  third 
feature  point  whose  coordinates  (u,v)  in  the  frame  Oxy  we  wish  to  compute. 
We  will  consider  the  general  case  (Eqn.  4.5)  where  the  center  of  the  (skewed) 
coordinate  system  defined  by  the  basis  triplet  is  expressed  as  a  function  of  the 
position  vectors  of  the  points  comprising  the  basis.  Eqn.  4.5  can  be  rewritten  in 
matrix  form  as 


/ 

x2 
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(6.10) 


Introduction  of  Gaussian  perturbation  in  the  positions  of  all  four  points,  will 
result  in  a  perturbation  of  the  correct  solution  x  =  (it,  v)  by 


6x  =  A-1  (6b)  -  A-1  (6 A)  x 


Again  the  second  and  higher-order  perturbation  terms  have  been  ignored.  If 
(X,Y),  (X,,  Y,'),=ij2i3,  (U,V),  are  stochastic  variables  denoting  the  perturbations 


of  p,  {p?  } 2  =  1, 2, 3 

and  (u,v)  respectively,  then  Eqn. 

6.10  can  be  rewritten 

as: 

-ku(X 2  -  Xi) 

-kv(X3  -  Xi) 

-k(aX  i  +  /3X2  +  yX3) 

+kX 

(  U  \ 

-Iu(Y2-Y1) 

-lv(Y3-Y1) 

-/(aY1+/3Y2  +  7Y3) 

+IY 

V  v  ) 

-mu(X2  -  Xi) 

-mv(X3-X  i) 

-m(aX  i  +  /3X2  +  7X3) 

+mX 

^  —  nit(Y 2  -  Yi) 

-nv(  Y3-Y1) 

— ra(aY  i  +  /3Y2  +  7Y3) 

+raY 

where  we  have  substituted  A  1  by  (  km  lnj  and  k,l,m,n  £  R.  From  standard 
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probability  theory,  the  joint  probability  density,  /(£/,  V),  of  U  and  V  is  given  by: 

/(u,  V)  =  /  /(X(U,V),Y(U,V))/(X1,Y1)/(X2,Y2)/(X3,Y3)  /  N 

•'R6  (6.11) 

|  kn-ml  I”1  dX1dX2d\3dY1dY2dY3  . 

Both  U,  and  V  are  linear  combinations  of  normally  distributed  stochastic 
variables.  Thus,  the  joint  distribution  of  U,  V  will  again  be  normal.  Evaluation 
of  the  above  integral  yields  for  the  probability  density  of  the  perturbation  of  x 

(6,12) 

In  other  words,  if  we  ignore  second  and  higher  order  terms,  additive  Gaussian 
positional  error  results  in  computed  values  for  the  affine  invariants  that  are  also 
Gaussian  distributed  around  their  “true”  value  with  covariance  matrix  Ea. 

It  is  clear  from  expression  6.12,  that  the  spread  around  the  noise-free  solution 
will  be  small  for  small  values  of  the  positional  error  in  the  input,  small  values  of 
the  computed  invariants,  and  long  bases.  Also,  the  smaller  the  skewness  of  the 
coordinate  frame  that  the  three  model  points  define,  the  smaller  the  spread  will  be. 
In  other  words,  there  is  a  distinct  covariance  matrix  for  every  ordered  quadruplet 
that  is  formed  using  model  points.  For  the  special  case  where  a  =  [3  =  7  =  1/3, 
the  covariance  matrix  Ea  becomes 
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Chapter  7 


Bayesian 


Interpretation 


In  this  chapter  ve  provide  an  interpretation  of  geometric  hashing  that  shows  that 
the  algorithm  is  equivalent  to  a  Bayesian  maximum  likelihood  object  recognition 
method.  The  hypotheses  span  the  discrete  collection  of  models  and  the  discrete 
pairings  of  image  features  to  model  features. 

The  equivalence  is  precise  only  in  the  case  where  an  adaptive  weighted  vot¬ 
ing  scheme  is  used  to  accumulate  evidence  for  model/basis  combinations  in  the 
geometric  hashing  framework.  We  provide  formulas  for  the  weighting  functions 
in  the  case  of  similarity  and  affine-invariant  recognition  of  point  patterns  and 
under  the  assumption  of  Gaussian  positional  error  in  the  image  points.  Finally 
we  discuss  how  the  weighted- voting  scheme  and  maximum-likelihood  hypothesis 
selection  reduces  false  alarms. 

We  begin  the  chapter  by  providing  an  abstract  presentation  of  the  geometric 
hashing  method  reformulating  the  more  specific  version  given  in  section  2.2.1. 
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7.1  Abstract  Formulation  of  Geometric  Hashing 


As  we  have  already  mentioned  geometric  hashing  operates  on  image  features 
in  order  to  locate  objects.  Image  features  are  generated  as  the  output  of  the 
feature  extraction  stage  and  ^pically  are  local  measurements  which  depend  on 
the  grey  levels  of  an  image  region.  Each  feature  typically  results  in  a  vector  of 
measurements.  Minimally  a  feature  carries  only  positional  information  in  whir 
case  it  is  a  two-vector  giving  the  coordinates  of  a  location.  However  there  my 
be  other  measurements  (attributes)  associated  with  a  feature  su<h  as  an  angle 
measurement  a  gry  level  value  a  direction  (for  example  of  a  line)  etc. 

Henceforth  ve  will  assume  that  the  image  features  are  elements  of  Rp  and 
that  c  image  features  are  needed  to  determine  a  basis.  Nominally  p  =  2  i.e.  the 
features  are  points.  The  case  c  =  2  corresponds  to  rigid  and  similarity  invariance 
whereas  c  =  3  corresponds  to  affine  invariance.  Depending  on  the  feature  type 
and  invariance  class  other  combinations  of  p  and  c  are  also  possible.  We  will 
further  assume  that  the  class  of  possible  transformations  under  which  we  desire 
invariant  recognition  is  givn  by  F.  The  members  of  F  act  on  image  features. 

The  database  contains  M  models:  1,2, ,  M.  Each  model  m  is  a  collection 
of  n  features  Fm  =  {f m,k}l=1-  For  simplicity  ve  are  assuming  that  the  number 
of  features  is  the  same  across  the  models. 

For  recognition  we  are  given  an  image  S  containing  a  total  of  S  features 
S  =  In  the  terminology  of  Flynn  [32]  we  seek  a  collection  of  interpre¬ 

tations.  An  interpretation  is  a  tuple 

\m,  [fm Jj  ,  Pfcj]  7  [fmj25  P k2\  ■}■■■■}  [Ira, >5  Pfcr]] 

where  m  gives  the  index  number  of  a  model  (1  <  m  <  M)  and  the  remainingr 
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pairs  establish  approximate  correspondences  between  a  subset  of  model  features 
of  Fm  with  image  features.  Typically  an  irterpretation  involves  more  than  c 
matches  (i.e.  r  >  c)  so  that  an  irterpretation  carries  an  implicit  transformation 
that  approximately  matches  a  subset  of  features  in  a  model  to  a  subset  of  image 
features  and  the  pairings  are  ah  distinct.  An  irterpretation  with  r  =  c  distinct 
pairs  is  a  candidate  matching  (or  simply  a  matching)  which  can  be  verified  or 
rejected  according  to  whether  the  matching  can  be  extended  to  a  viable  interpre¬ 
tation  with  the  number  of  matchings  substantially  greater  than  c.  Note  that  a 
priori  a  recognition  system  rnrst  search  over  ah  possible  matchings: 

\mt  ?  Pfci]  5  [fmj25  P k2\  ■}■■■■}  [fmjc)  Pfcc]] 

to  determine  whether  matchings  can  be  extended  to  interpretations.  Clearly  since 
there  are  M  models  n  features  in  each  model  andS1  image  features  there  are 
O  (MncSc)  candidate  matchings. 

Hash  Functions.  In  the  geometric  hashing  model  ve  use  a  hash  function  that 
maps  groups  of  N  =  c-\-d  features  to  Re.  Here  d  is  the  number  of  “extra  features  ” 
and  e  is  the  dimensionality  of  the  space  of  invariants.  Again  nominally  when  ve 
use  point  features  ve  have  d  =  1  i.e.  the  hash  function  is  defined  on  groups  that 
consist  of  one  more  point  than  the  basis  set  ande  =  2  meaning  that  the  hash 
function  maps  into  the  Cartesian  space.  But  more  generally  the  hash  function 
uses  collections  of  features  to  map  into  a  higher  dimensional  space.  The  essential 
point  about  the  hash  function  is  that  it  is  F-invariant.  Thus  if  T  £  F  is  a 
transformation  andpi,  p2, .  .  .  ,  pjv  is  a  collection  of  N  features  then 

h  (Tpl5  Tp2, . . . ,  Tpjv)  =  h  (pl5  p2, . . . ,  pjv)  . 
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Further  in  order  to  enable  recognition  in  the  presence  of  noise  h  must  be  a 
continuous  function.  This  is  in  complete  distinction  with  normal  hashing  methods 
for  database  search  where  the  hashing  function  ^pically  randomizes  the  keys  as 
much  as  possible.  As  such  hashing  is  not  necessarily  the  most  appropriate  name 
for  the  technique  using  a  continuous  function  h;  perhaps  it  should  instead  be 
called  “geometric  indexing.” 

An  additional  desirable  property  of  the  hash  function  is  that  the  fibers  coincide 
with  equivalence  classes  of  the  transformation  group.  Thus  if  two  collections  of 
points  have  the  same  hash  location: 

h(pi,p2,...,pjv)  =  h  (pi)  P25  •  •  •  5  Pjv)  5 
then  there  exists  a  transformation  T  £  F  such  that 

(Tpi,  Tp2, . . . ,  Tpjv)  =  (pi,  p'2, . . . ,  p'N) . 

We  will  say  that  a  hash  function  that  satisfies  this  property  is  F-specific  as 
in  for  example  affine-transformation-specific.  An  .F-specific  hash  function  is  a 
bijection  on  equivalence  classes  of  iV-tuples  of  features  modulo  transformations 
from  the  class  F.  Systems  are  likely  to  be  possible  with  hash  functions  that  are 
not  transformation-class-specific. 

We  can  observe  why  the  hash  function  must  operate  on  N  features  with  N  >  c. 
If  we  define  a  hash  function  on  a  basis  set  or  sub-basis  set  then  using  a  transfor¬ 
mation  the  hash  wlue  must  be  the  same  independent  of  the  configuration  of  the 
features.  That  is  for  aiy  other  configuration  ve  could  map  the  initial  basis  set 
(or  sub-basis  set)  to  the  given  configuration  and  the  hash  wlue  should  remain 
the  same.  A  constant  hash  function  is  of  little  use. 
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On  the  other  hand  in  practice  features  are  not  arbitrarily  modifiable  jo  the 
transformations.  For  example  if  there  are  multiple  feature  types  then  the  ype  of 
the  feature  is  usually  not  changed  by  a  transformation.  In  this  case  collections  of 
features  e’en  in  a  basis  set  coitain  certain  inherent  information  that  is  invariant 
with  respect  to  the  transformation  class  and  this  in  this  case  a  hash  function 
may  be  defined  that  takes  advantage  of  this  information.  Our  treatment  here  is 
more  elementary  and  abstract  this  we  assume  that  any  collection  of  c  or  fewer 
features  may  be  mapped  by  a  transformation  to  any  configuration  of  the  same 
number  of  features.  Accordingly  a  hash  function  one  or  fewer  features  will  be 
constant. 

However  a  hash  function  defined  oniV  =  c  +  d  features  will  map  into  a 
Euclidean  space  and  can  exhibit  up  to d-p  degrees  of  freedom  where  ve  recall  that 
p  is  the  dimensionality  of  the  features.  Thus  the  space  of  invariants  can  effectively 
span  a  (d  ■  p)-manifold  and  is  most  reasonably  parameterized  ly  Euclidean  space. 
Consequently  the  dimensionally  e  of  the  hash  space  will  always  be  less  than  the 
number  of  extra  degrees  of  freedom  in  the  arguments  to  the  hash  function  i.e. 
e  <  d-p.  Thus  if  the  hash  function  is  defined  on  groups  of  2D  points  consisting  of 
a  basis  set  plus  one  point  then  the  maximum  effective  dimensionality  of  the  space 
of  invariants  will  be  two.  However  if  the  hash  function  is  defined  on  sa  basis 
sets  plus  two  points  then  ve  can  have  a  four-dimensional  space  of  invariants. 

Indexing.  Hash  locations  are  used  to  index  to  model/basis  combinations.  In¬ 
dexing  is  based  on  a  two-phase  process.  In  the  first  phase  the  models  are  pro¬ 
cessed  off-line  and  the  hash  table  data  structure  is  built.  During  the  preprocessing 
phase  hash  wines  are  computed  for  combinations  of  N  features  from  models.  Ide- 
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ally  for  evry  set  Tm  for  evry  combination  of  N  features  ,2,  •  •  •  ,  C 

Tm  and  for  evry  permutation  of  these  N  features  h(fmji,  fmj2,  fmjjv)  is 
computed.  In  practice  certain  permutations  might  be  omitted  and  some  conbi- 
nations  might  be  collapsed  because  of  symmetries  in  the  hash  function.  Normally 
at  least  c  of  the  model  features  will  be  distinct  but  there  is  no  general  reason 
for  prohibiting  repetitions.  For  each  computed  hash  location  in  Re  w  define 
an  entry  in  the  space  of  invariants.  An  entry  consists  of  a  tagged  point  in  Re 
such  that  given  a  point  (u,  v)  in  the  space  of  invariants  a  selection  algorithm  can 
quickly  locate  nearby  entries  in  the  hash  space  and  access  the  tagged  information 
on  each  such  entry.  One  might  reasonably  use  a  k  —  D  tree  representation  for  the 
set  of  all  entries  [74], 


Figure  7.1  The  preprocessing  phase:  for  each  model  and  for  every 
N-tuple  of  points  in  the  model,  a  hash  location  is  computed,  and  an 
entry  is  recorded  in  the  space  of  invariants  at  that  location.  The 
entry  is  tagged  with  the  information  concerning  the  model  identity 
and  model  features  that  were  used  to  compute  the  position. 

What  information  is  tagged  at  an  entry?  As  a  minimum  the  eitry  records  its 
location  in  the  space  of  invariants  and  the  identity  of  the  model  that  was  used  to 
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create  the  entry  (i.e.  model  m).  Typically  the  eitry  will  also  contain  information 
allowing  one  to  deduce  the  tuple  of  features  from  model  m  (fmjl ,  fmj2 , .  .  .  ,  fmjJV) 
that  was  used  to  compute  the  location.  Clearly  it  is  sufficieit  to  record  the 
indices  ( J 1 ,  J 2 ,  •  •  •  ,  jjv)-  Further  information  might  be  recorded  with  the  entry.  For 
example  ve  could  attach  an  error  model  giving  the  ccvariance  of  the  expected 
distribution  of  the  entry  in  the  space  of  invariants  as  instances  of  the  model 
in  a  typical  image  are  subjected  to  noise.  In  Flynn’s  terminology  the  tagged 
information  at  an  entry  is  called  a  proto-hypothesis.1  A  sketch  of  the  preprocessing 
phase  is  depicted  in  Figure  7.1.  More  formally  ve  define  a  hash  entry  as  follows: 

Definition:  An  entry  w  is  a  tuple  of  a  model  index  and  indices  of 
an  iV-tuple  of  model  points:  u  =  [m,  [ J 1 ,  J 2 ,  •  •  •  ,  jjv]]  with  a  position 
denoted  as  £  (cu)  =  h  (fmjl ,  fmj2,  •  •  •  ,  fm,jN)  and  a  tag  t  (lo)  which  is 
typically  the  information  [m,  [ji,  j2,  •  •  •  ,  jc]]  but  migHt  include  addi¬ 
tional  information.  □ 

The  second  phase  is  the  recognition  phase  (on-line).  We  are  presented  with 
an  image  and  S  features  {pi,  p2,  •  •  •  ,  Ps}  are  extracted.  We  then  initiate 
a  search  in  the  image.  Ultimately  subcollections  of  IV  features  will  be  used 
to  compute  hash  locations  h  (pi,  p2, .  .  .  ,  Pjv)-  Once  again  repetitions  of  image 
features  in  the  IV-tuple  might  be  allowed.  For  each  such  hash  location  ve  locate 
all  nearby  entries  in  the  hash  table  and  based  on  the  information  in  each  entry 
we  generate  one  or  more  votes  (see  Figure  7.2).  Each  nearby  entry  generates  a 
candidate  interpretation.  Specifically  if  the  eitry  u  =  [m,  [ji,  j2, Jjv]]  is  tagged 

1  Flynn  [32]  calls  a  quantized  bin  in  space  of  invariants  an  entry.  Obviously  this  use  of  the 
term  clashes  with  our  use  of  hash  entry  which  refers  to  a  tagged  point  in  the  space  of  invariants. 
Thus  Flynn’s  entry  is  our  hash  bin  and  his  proto-lypothesis  is  our  hash  entry. 
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Figure  7.2  The  recognition  phase  and  voting  process  of  Bayesian 
geometric  :  hashing  using  N-tuples  of  image  features,  locations  in  the 
space  of  invariants  are  computed,  and  nearby  entries  are  accessed. 
Each  entry  is  tagged  with  a  model  number  and  a  set  of  model  features, 
which  can  be  paired  with  the  image  features  used  to  compute  the  hash 
location  to  form  a  candidate  interpretation.  Interpretations  are  then 
given  weighted  votes. 


with  the  information  ‘(model  m  )’  and  model  points  (fmjl ,  fmj9,  .  . .  ,  im,jN  )  then  the 
hash  h  ( p,  ,  p,  , .  .  .  ,  piN  )  to  a  nearby  location  generates  a  candidate  interpretation: 

[???.,  [f mjj,  Pfci]  i  Pfc2]  i  i  [fmjjvi  Pfcjv]]  (  ^-1) 

In  certain  cases  (for  example  if  the  hash  function  is  kncwn  to  have  certain 
symmetry  properties)  other  iiterpretations  might  also  be  generated  iivolving 
the  same  sets  of  features.  Votes  are  generated  for  interpretations  or  perhaps 
sub-interpretations  meaning  a  portion  of  the  iiterpretation.  For  example  w 
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might  generate  votes  for  model  numbers  (i.e.  the  index  m  for  the  given  en¬ 
try)  and  then  consider  only  models  that  receiv  sufficient  votes  in  a  subsequent 
search  process.  Alternatively  ve  may  generate  votes  for  matches  of  the  form 
[m,  [fmjl ,  pfcl]  ,  [fmj2 ,  pfc2]  , .  .  .  ,  [fmjc,  PfcJ]  and  later  vrify  candidate  matches  that 
receive  sufficient  votes.  In  between  it  is  possible  to  generate  v>tes  for  sub¬ 
matchings  together  optionally  with  v>tes  in  Hough-space  for  transformation  pa¬ 
rameter  values.  However  our  practice  is  to  generate  v>tes  for  full  interpretations 
of  the  form  of  Eqn.  7.1.  However  if  tvo  distinct  collections  of  N  image  points 
vote  for  the  same  entry  then  the  tvo  resulting  interpretations  are  incompatible 
and  arbitration  must  be  performed.  The  most  convenient  time  to  perform  the  ar¬ 
bitration  is  when  the  hash  takes  place  since  both  candidate  irterpretations  may 
be  “stored”  at  the  hash  table  entry. 

A  single  vote  for  a  single  interpretation  may  not  in  itself  constitute  much 
evidence  for  the  existence  of  a  model.  Two  interpretations  are  compatible  hcw- 
ever  if  they  agree  on  pairs  of  model-feature  and  image-feature  pairs  and  do  not 
disagree  on  other  features.  Accordingly  after  collecting  v>tes  for  multiple  inter¬ 
pretations  w  will  want  to  collapse  the  votes  to  discover  multiple  support  for 
common  sub-interpretations.  Sub-interpretations  with  c  pairs  of  distinct  features 
(i.e.  matchings)  are  particularly  desirable  since  they  indicate  a  unique  mathing 
between  a  model  and  an  image  object  together  with  the  approximate  transfor¬ 
mation. 

Search  Strategies.  Given  an  image  hash  mlues  and  votes  are  generated  by 
iV-tuples  of  image  points.  For  an  image  with  S  points  there  ar eO  such 

tuples.  Each  tuple  can  be  used  to  locate  a  hash  value  and  this  nearby  entries 
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in  the  hash  table.  Each  such  entry  can  generate  candidate  interpretations  which 
can  involve  up  to  N  feature  pairs  each. 

We  wish  to  structure  the  search  over  the  O  (SN>)  tuples  so  as  to  quickly  extract 
as  many  valid  interpretations  as  possible.  Note  that  the  search  space  that  occurs 
without  the  use  of  hashing  is  of  size  O  ( MncSc )  so  that  hashing  is  more  farorable 
when  M  is  large  or  in  general  whenS'<i  =  o(Mnc ).  Of  course  the  trade-off 
depends  on  the  effectiveness  of  the  hash  function  and  the  Epical  number  of 
interpretations  that  are  generated  during  the  search. 

The  best  search  strategy  will  depend  on  the  application  and  will  also  be 
considerably  influenced  by  the  possible  use  of  multiple  feature  types.  However  the 
following  strategy  is  a  generic  one  that  structures  the  search  and  takes  advantage 
of  the  possibility  of  “grouping”  features  in  the  image. 

We  iteratively  chose  basis  sets  of  c  distinct  features  in  the  image.  There  are 
O  (Sc)  such  sets.  For  each  basis  set  ve  perform  what  we  call  a  probe.  The  basis 
sets  may  be  chosen  at  random  or  my  make  use  of  grouping  or  other  strategies.  It 
is  desired  that  the  c  features  chosen  in  the  image  all  belong  to  a  single  object.  On 
a  parallel  machine  multiple  probes  may  be  performed  concurrently.  In  section  8.2 
we  describe  a  randomized  algorithm  that  allows  us  to  discard  a  probe  after  only 
a  fraction  of  the  image  features  have  been  considered. 

Entries  will  obtain  weighted  votes  during  the  probe;  at  the  start  of  the  probe 
all  weights  are  zero.  For  each  probe  ve  append  to  the  chosen  basis  set  B 
collections  of  d-tuples  of  image  features.  For  technical  reasons  ve  prefer  to  allow 
repetitions  among  the  features  in  the  d-tuple  although  some  applications  my 
ban  such  repetitions.  In  any  case  thed-tuples  will  be  formed  of  image  features 
that  do  not  include  any  of  the  basis  set.  Using  the  tuple  formed  from  B  and  a 
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(/-tuple  the  hash  function  is  applied  to  the  resultingiV-tuple  to  compute  a  hash 
location.  Nearby  entries  receive  weighted  votes.  If  an  entry  has  already  received 
a  non-zero  vote  during  this  probe  then  a  collision  occurs  since  distinct  tuples  of 
image  features  are  competing  with  the  same  iV-tuple  of  model  points.  We  accept 
the  vote  with  the  closer2  hash  location  which  typically  means  the  maximum  vote. 
After  all  (or  many)  (/-tuples  of  image  features  are  processed  each  entry  u  will 
have  a  weight  z  (cu)  and  ve  are  ready  to  accumulate  votes  for  matchings.  An 
entry 

^  =  [m,  [ji,  J2,  •  •  •  ,jc,jc+ 1,  •  •  •  ,  Jn]\ 

adds  its  vote  to  a  model/basis  accumulator  indexed  by  [m,  [ J 1 ,  J 2 ,  •  •  •  ,  jc]]-  This 
accumulator  implicitly  represents  the  interpretation  that  the  chosen  basis  set  B 
matches  with  the  c  model  features  used  as  a  basis  in  the  computation  of  the 
corresponding  entry.  Note  that  the  index  of  the  accumulator  to  which  the  votes 
are  collapsed  is  the  same  as  the  data  that  is  usually  included  in  the  tag  t(u). 
We  may  write: 

Z(m,  •  •  •  ,Jc]  ,#)  =  X)  • 

jc+1  ON 

If  for  a  given  probe  no  model/basis  conbination  receives  a  sufficient  number 
of  votes  then  ve  deem  that  there  is  no  valid  interpretation  and  coitinue  with 
another  probe  (a  different  basis  set)  in  the  image.  If  instead  there  are  model/basis 
combinations  with  a  substantial  weighted  vote  then  each  such  implicit  matching 
should  be  verified. 

If  all  O  (Sc)  probes  are  performed  and  O  (5^)  different  tuples  are  used  in 
each  probe  then  a  complete  search  of  O  {SN^j  will  be  invoked.  However  ly 
2  A  Mahalanobis  distance  metric  is  used  here. 
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organizing  the  search  over  basis  sets  in  the  image  v)tes  may  be  accumulated  for 
interpretations  that  involve  exactly  c  pairs  of  features  and  basis  sets  in  the  image 
that  are  likely  to  lead  to  recognition  may  be  favored  over  completely  random 
selections.  Further  an  object  is  ^pically  recognized  once  any  combination  of  c 
features  on  the  object  is  selected  as  a  basis.  Since  there  is  a  (”)-fold  redundancy 
in  the  representation  of  the  objects  there  is  a  concomitait  expected  speed-up. 

7.2  Updating  Formulas  and  Conditional  Inde¬ 
pendence 

Let  'H  be  a  hypothesis  su<h  as  “object  O  is  present  in  a  designated  region  of 
the  image.”  Let  £\  and  £2  be  two  pieces  of  evidence  that  have  impact  on  the 
probability  of  7i.  In  what  follows  ve  assume  that  £\  and£2  are  boolean  events 
although  extensions  to  predicates  involving  continuous-valued  measurements  are 
straightforward.  Our  interest  is  to  compute 

Pr(ft  \SUS2). 

Let  us  assume  that  we  know  in  advance  the  values  of  Pr(7d  |  £\)  Pr  (7f  |  £2)  and 
the  prior  probability  Pr(7d). 

In  general  there  is  nothing  that  can  be  said.  The  functional  dependence  of 
Pr  (7 i,  |  £1,  £2)  on  the  values  Pr  (7 i,  \  £i)  and  Pr  (7 i,  \  £2)  can  be  anything.  However 
if  we  make  the  additional  assumption  of  conditional  independence  of  the  events 
£\  and  S2  i.e. 

Pr(£i  I  £2,H)  =  Pr(£!  I  H), 

or  equivalently  Pr^i,£2  |  7d)  =  Pr(£i  |  7d)-Pr(£2  |  7d)  then  ve  may  deduce  that 

Pr  (7i  |  £i,  £2)  _  ~  ¥r(H\£1)  ¥r(H\£2) 

Pr(7f)  Pr(7f)  '  Pr(7f) 
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where  K  is  a  constant  independeiti  of  H  and  equal  to 

i-  Pr(£i)-Pr(&) 

Pr(£i,£2) 

The  formulas  follow  directly  from  Bayes’  theorem  for  conditional  probabilities. 
If  we  additionally  assume  unconditional  independence  of  the  events  £i  and  £2  i.e. 
Pr(£i  |  £2)  =  Pr(£i)  then  w  obtain  that  K  =  1. 

Unconditional  independence  is  neither  implied  by  nor  implies  conditional  in¬ 
dependence.  Accordingly  unconditional  independence  involves  an  additional  as¬ 
sumption  which  is  often  unjustified  and  in  aiy  case  is  typically  unnecessary.  The 
reason  is  that  we  usually  are  interested  in  the  comparative  probabilities  of  a  se¬ 
quence  of  propositions  H\  H2}  ■  ■  ■  based  on  the  evidence^  £2.  We  are  interested 
in  the  comparative  probabilities  Pr  (Hk  \  £i7£2)  and  since  the  constait  of  pro¬ 
portionality  K  is  independent  of  Hk  ve  are  not  concerned  with  its  value.  Note 
however  that  we  now  require  a  set  of  conditional  independence  assumptions  one 
for  each  hypothesis  of  the  form 

Pr  (£1  \£2,Hk)  =  Pr(£1  |  Hk). 

The  theory  is  usually  applied  when  one  is  given  a  sequence  of  evidence.  Thus 

if  £\  £2  £3  ...  is  a  sequence  of  pieces  of  evidence  whave 

Pr  {'Hk  |  £\ ,  £2 •>  £3  ■  ■  •)  _  v  -pr  Pr  (Hk  |  £j) 

Pr  (Hk)  “  ;  Pr  (Hk) 

under  the  assumption  that  the  conditional  independence  relations 
Pr(£i,£2,^3,  •  •  •  |  'Hk)  =  nPr(£*  I  Hk) ,  k  =  1,2, - 

i 

hold.  The  constant  of  proportionality  K  is  independeit  of  Hk  and  equals 

I[Fr(£.) 

Pr  (£1,  £2  5  £35  •  •  •) 
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There  are  many  different  ways  that  the  conditional  independence  relations  can  be 
satisfied.  However  all  pairwise  conditional  independence  relations  are  insufficieit. 
Instead  ve  need  a  sequence  of  relations.  The  following  relations  suffice: 


pr  (£j  I  Sjl,£h,...,£jr,Hk) 


PT(£j\nk),  r  >  1,  j  g  {ii,  j2,  •  •  •  ,jr}  •  (7.2) 


That  is  eah  piece  of  evidence  must  be  independent  of  the  union  of  any  other 
pieces  of  evidence. 

It  is  common  to  use  logarithms  rather  than  the  probabilities  in  order  to  turn 
products  into  sums.  Thus  givn  the  conditional  independence  assumptions  from 
Eqn.  7.2  we  have 


log  =  logA.  +  Elog  (P  r(«,  |  ft)' 


(7.3) 


pr(%)  ;  “  ,zr  “V  p r(«n  r 

where  the  bias  term  log  K  is  independent  of  the  hypothesis  Tik-  Using  Bayes’ 
theorem  one  more  time  Eqn.  7.3  my  be  rewritten  as 


'.PWLi)-W‘ENW) 

which  is  the  form  in  which  we  will  use  incremental  probability  assessment. 


7.3  Reasoning  with  Parts 


Suppose  we  are  looking  for  a  chair  in  the  image  of  a  room.  We  know  that  a  chair 
consists  of  several  parts:  legs  a  seat  and  ahair-back.  An  appealing  strategy  is 
to  search  separately  for  legs  seats  and  bdts  and  then  to  put  the  parts  together 
to  find  instances  of  chairs.  This  approach  combines  the  bottom-up  processing 
of  feature  extraction  and  simple  object  detection  with  the  top-down  reasoning 
about  the  components  of  chairs.  We  readily  arrive  at  a  situation  where  we  have 
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many  candidate  legs  seats  and  hair-backs.  Further  it  is  customary  to  hare 
probabilities  or  degrees  of  certainty  attached  to  each  such  detection.  When  we 
combine  the  parts  to  locate  instances  of  the  chair  object  it  is  desirable  to  combine 
the  probabilities  or  certainty  levels  into  a  single  estimate  of  the  likelihood  of  each 
detection  of  a  chair. 

The  difficulty  arises  from  the  fact  that  the  parts  of  the  chair  are  not  indepen¬ 
dent.  Indeed  haring  four  candidate  legs  in  the  appropriate  positions  makes  the 
likelihood  of  a  chair  much  greater  than  the  probability  based  on  any  single  leg 
alone.  We  might  be  very  unsure  about  any  single  leg  but  the  confluence  of  the 
four  legs  makes  the  presence  of  a  chair  at  that  location  very  likely.  Conditional 
independence  is  also  violated:  under  the  assumption  that  the  object  is  a  chair 
the  presence  of  a  leg  in  one  spot  makes  the  presence  of  a  leg  in  other  predictable 
locations  much  more  probable. 

If  we  discard  positional  information  then  a  certain  levl  of  conditional  inde¬ 
pendence  is  restored.  Under  the  assumption  that  a  chair  is  present  the  infor¬ 
mation  that  a  leg  is  present  tells  us  nothing  new  about  the  probability  of  the 
presence  of  another  leg  (or  chair-back  or  seat)  since  the  conditional  lypothesis 
already  tells  us  that  these  legs  are  present.  There  are  two  problems  with  this 
“solution.”  First  it  is  precisely  this  geometric  information  that  ve  wish  to  use 
in  order  to  accurately  deduce  the  location  and  presence  of  objects.  Second  ve 
need  conditional  independence  for  ah  possible  objects  in  our  database.  Thus 
although  the  presence  of  chair  legs  might  be  conditionally  independent  under  the 
assumption  of  the  presence  of  a  chair  under  the  assumption  of  the  presence  of 
a  sofa  the  presence  of  a  hair  leg  makes  the  presence  of  another  chair  leg  much 
more  probable. 
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Despite  these  difficulties  there  hare  been  successful  object  recognition  sys¬ 
tems  that  reason  about  parts  using  heuristic  formulas  to  combine  evidence  that 
accumulates  as  subparts  are  found  in  appropriate  positions  [28  67]. 

We  next  show  that  in  the  geometric  hashing  situation  where  ve  have  a  fixed 
basis  set  and  simple  features  the  individual  features  in  the  image  are  condition¬ 
ally  independent. 

7.4  Conditional  Independence 

Let  us  return  to  the  formulation  of  object  recognition  as  given  in  section  2.2.1. 
We  will  assume  that  a  basis  tuple  comprising  c  image  features  is  chosen  and  fixed: 

B  =  {Pft,PMv  ,P^e} 

The  hypothesis  is  formed  by  a  matching  involving  B.  That  is  a  lypothesis  will  be 
formed  by  the  predicate  that  a  matching  can  be  extended  to  a  valid  interpretation 
i.e.  that 

[m,  [fmjj,  P/ij]  ,  [fmj2  5  P/U2]  5  •  •  •  5  [Cjc?  P/iJ] 

can  be  extended  to  an  interpretation  involving  r  pairs  r  c.  Note  that  a 
hypothesis  involves  the  choice  of  a  model  index  m  a  selection  ofc  features  from 
the  model  ,  fj2 , .  .  .  ,  fJc)  and  an  image  basis£b  Accordingly  ue  will  denote  this 
hypothesis  by 

'Him,  [ji,  J2,  •  ..jc],B)  ■ 

There  is  a  total  of  O  ( Mnc )  possible  hypotheses  for  a  givn  fixed  basis  tuple  B. 

The  evidence  for  any  particular  hypothesis  will  be  based  on  features  of  the 
image  except  for  the  basis  set: 

S'  =  S  -  B  =  {p;  |  l  g  {hi,  n2,  •  •  •  ,  He}}  ■ 
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We  will  only  use  the  evidence  that  comes  from  the  hash  locations 

..Mil  h  (P/U  5  P/22  5  "  "  "  5  P Ate  5  Pi'l  5  P^2  5  "  "  "  5  Pi'd)  ' 

where  {p^p^,  •  •  •  ,  P^d }  C  S'.  The  first  c  features  are  all  distinct  whereas  the 
final  d  features  are  permitted  to  have  repetitions.  There  are  potentially  other 
pieces  of  evidence  available  for  example  from  permutations  of  the  arguments 
to  the  hash  function  but  ve  will  not  use  such  permutations  and  depending  on 
the  symmetries  exhibited  by  the  hash  functions  suh  information  will  usually 
be  redundant.3  Thus  the  information  is  formed  from  the  hash  function  applied 
to  the  fixed  basis  tuple  together  with  d-tuples  of  features  in  the  image.4  When 
d  =  1  this  means  that  there  are  S'  —  c  pieces  of  evidence.  In  general  there 
are  (S'  —  c)d  pieces  of  evidence  available  assuming  that  all  permutations  of  the 
arguments  p^ ,  p^, .  .  .  ,  p„d  yield  extra  information.  Each  piece  of  evidence  can 
be  regarded  as  a  realization  of  a  random  variable  that  can  be  defined  as 

Y  h  (P/ii  5  P/i2  i  •  •  •  i  P/ic  1  -^-1 1  -^-2  i  ■  ■  ■  i  ~^-d ) 

where  the  various  X;  are  independent  random  vectors  assuming  values  in  S'. 5 
When  we  speak  of  conditional  independence  of  the  evidence  w  refer  to  informa¬ 
tion  that  comes  from  realizations  of  this  random  vector  and  not  to  the  indepen¬ 
dence  of  this  random  vector  from  other  random  processes. 

In  order  to  deduce  that  the  evidence  is  conditionally  independent  w  will 
need  one  further  assumption.  We  need  to  assume  that  the  entries  in  the  space  of 
invariants  do  not  “clump  ”  so  that  no  tvo  entries  of  the  same  model  and  with  a 

3For  example  ifli  is  transformation-specific  then  permutations  carry  no  added  information. 

4Recall  that  d  is  the  number  of  extra  features. 

5Note  that  it  can  happen  that  some  of  the  X8-  realize  the  same  feature. 
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fixed  basis  set  (i.e.  the  same  first  c  model  features)  land  near  one  another  in  the 
space  of  invariants.  Ideally  the  hash  function  is  injectiv  on  the  d-tuple  of  features 
for  a  fixed  basis  tuple  in  the  first  c  parameters  so  that  the  condition  amouits  to 
saying  that  model  features  are  distinct  and  separated.  This  is  typically  the  case 
for  a  transformation-specific  hash  function.  If  more  than  one  model  features  land 
in  approximately  the  same  location  then  the  features  should  be  coalesced  iito  a 
single  feature  or  the  hash  function  should  be  modified. 

We  claim  that  the  evidence  is  conditionally  independent.  In  order  to  see 
this  let 'H  (z,  [ji,  j2,  •  •  •  ,  jc]  ,  B)  be  a  hypothesis.  The  evidence  is  the  set  of  hash 
locations 

r  =  {6^, I  c 5'}}  . 

To  show  independence  ve  consider  the  probability  of  a  hash  to  a  given  location 
under  the  assumption  of  the  hypothesis  'H  (z,  [ji,  j2,  •  •  •  jk\  ,  B).  We  are  also  inter¬ 
ested  in  the  probability  of  a  hash  to  the  same  location  under  the  same  hypothesis 
given  a  subcollection  F  C  T  of  other  hash  locations.  Since  the  hash  wlue  lo¬ 
cations  are  continuous  variables  the  probabilities  can  be  replaced  ly  evaluations 
of  the  corresponding  probability  density  functions  and  ve  will  denote  the  two 
functions  by 

/y|w(£)  and  fY\r’,n  (0  , 

respectively;  'H  is  an  abbreviation  for  'H  (z,  [ J 1 ,  J 2 ,  •  •  •  jq]  ,  B). 

Conditional  independence  of  the  evidence  means  the  following:  given  a  single 
hash  point  together  with  a  collection  T'  of  other  hash  locations  all 

distinct  then 

fy\n  iy\,  ”2,  ■  ■  ■  I'd)  =  fY\r’,n  {yi,  ^2,  •  •  •  v<i)  ■ 
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Let  us  first  consider  fy\n  \$,v1,v2,..;Vd) '  From  our  assumption  ue  know  that  the 
m-th  model  is  embedded  in  the  image  with  model  featuresf^  fj2  .  .  .  fJc  mapping 
to  the  image  points  pMl  p^2  ...  anqb|Uc.  Although  some  of  the  remaining  points 

of  model  m  may  be  obscured  there  is  a  strong  lifelihood  of  their  presence  and 
thus  there  is  a  large  likelihood  of  obtaining  hash  values  at  locations  corresponding 
to  hashes  of  model  m  using  basis  fmjl  fmj2  .  .  .  fmjc.  Indeed  at  evry  location  in 
the  hash  table  where  there  is  an  entry  with  tag  [m,  [ J 1 ,  J 2 ,  •  •  •  ,  jg]]  there  is  a  large 
likelihood  of  an  image  hash  to  the  same  location  (since  the  hash  function  is  trans¬ 
formation  invariant  and  the  corresponding  model  features  occur  in  the  image). 
Thus  the  probability  density  function  for  h  (pMl ,  p^2, .  .  .  ,  p|Uc,  Xi,  X2, .  .  .  ,  X^)  will 
include  spikes  at  the  positions  where  these  entries  occur  in  the  hash  table.  The 
spikes  will  be  smeared  out  due  to  the  possibility  of  noise  in  the  image  features.  All 
the  other  hashes  of  h  (pMl ,  p^2, .  .  .  ,  p|Uc,  Xi,  X2, .  .  .  ,  X^)  will  involve  one  or  more 
of  the  random  vectors  X;  taking  on  the  value  of  image  features  not  belonging  to 
the  model  m.  The  contribution  to  the  density  function  due  to  these  hashes  will 
be  more  continuous  and  depends  upon  the  background  distribution  of  features  in 
the  image.  Typically  ve  will  assume  that  this  background  distribution  is  uniform 
in  the  image  although  in  applications  one  migh  be  able  to  solve  analytically  or 
empirically  for  the  true  distribution.  In  any  case  the  distribution  will  add  to  the 
spikes  in  the  distribution  due  to  the  model  features  embedded  in  the  image.  The 
resulting  distribution  will  be  a  weighted  superposition  of  the  two  distributions 
appearing  in  Figures  7.3  and  7.4;  as  we  will  show  in  what  follows  an  analytic 
expression  can  derived  for  this  distribution. 

Next  consider  the  wlue  of  the  density  function  fY\r',n  •  We  are 
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Figure  7.3  The  probability  density  function  of  hashes  in  the  space 
of  invariants  which  are  generated  by  image  features  not  belonging  to 
the  model  that  is  embedded  in  the  image. 

given  a  collection  of  points  H  in  the  space  of  invariants  and  ve  know  that  the 
image  has  generated  hashes  to  these  locations.  We  also  know  that  the  model  m  is 
embedded  in  the  image  with  a  known  transformation.  We  are  given  a  new  hash 
location  ^  Vd  which  is  generated  by  a  distinct  hash  and  ve  ask  whether 
knowledge  of  H  changes  the  likelihood  of  the  hash  to  this  location. 

Some  of  the  hashes  in  H  may  occur  at  locations  that  confirm  the  presence  of 
the  model  m  occurring  at  or  near  locations  where  the  spiles  occur  in  the  density 
function  /y|k  due  to  hashes  Evolving  only  point  features  from  the  embedded 
model.  However  since  ve  already  know  from  the  hypothesis  that  the  model  is 
present  this  information  tells  us  nothing.  Each  such  hash  makes  it  much  less 
likely  that  another  hash  will  occur  at  the  same  location  since  the  conbination  of 
image  features  from  the  model  generating  the  hash  is  thereby  “used.”  However 
since  the  hash  ^  Vd  is  known  not  to  be  among  the  hashes  in  H  and  since 
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hashes  due  to  model  points  are  separated  the  suppression  of  the  lifelihood  at  hash 
location  entries  will  not  influence  the  density  function  evaluated  at  ^  .  The 

remaining  information  from  T'  includes  hashes  from  features  that  are  not  part  of 
model  m  which  tell  us  nothing.6  Accordingly  the  lifelihoocl  of  the  hash  to 
location  ^  Vd  is  unchanged  by  knowledge  of  the  hashes  to  locations  in  T' 
which  is  to  say  that  the  evidence  in  the  hash  locations  T  are  independent  under 
the  hypothesis  7i  for  all  possible  lypotheses. 


bWe  could  potentially  use  the  hashes  from  I  '  to  deduce  the  locations  of  some  image  features 
providing  there  are  enough  hashes  that  utilize  the  same  image  features.  Once  we  know  the 
value  of  other  image  features  then  the  densiy  distribution  is  modified  either  adding  spiles  at 
positions  or  blurred  submanifolds  where  hashes  are  somewhat  more  likely.  However  extracting 
this  information  would  require  a  sophisticated  statistical  analysis  of  T'  and  also  lilely  depends 
on  a  high  dimensionality  of  the  space  of  invariants.  This  dependence  does  not  occur  when  d  =  1 
which  is  our  application.  We  would  be  inclined  to  discount  this  dependence  in  other  cases  also 
but  a  more  detailed  analysis  is  required. 
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7.5  Density  Functions 


In  order  to  develop  the  Bayesian  geometric  hashing  algorithm  ve  will  need  to 
make  use  of  the  density  functions 

fY\H  (0  and  /y  (0  • 

Recall  that  Y  is  the  random  vector  h  (pMl ,  p|U2,  .  .  .  ,  p|Uc,  Xi,  X2, .  .  .  ,  X^)  where  the 
X;  independently  take  on  image  feature  values  in  S'.  That  is  ve  must  compute 
the  density  of  hashes  in  the  space  of  invariants  obtained  by  applying  the  hash 
function  to  the  concatenation  of  the  fixed  basis  tuple  with  arbitrary  d-tuples  of 
image  features  (chosen  from  S')  both  with  and  without  the  assumption  that  the 
chosen  basis  participates  in  a  valid  interpretation  of  a  model  object.  The  first 
probability  density  function  was  discussed  in  the  previous  subsection  whereas 
the  second  function  is  new. 

We  next  provide  more  precise  general  formulas.  In  applications  the  densi^ 
functions  will  have  to  be  computed  using  specific  noise  models.  Our  treatment 
will  assume  that  the  map 

T  (n,r2,...,rd)  =  h(pAtl,pAt2,...,pAtc,r1,r2,...,r(i)  (7.5) 

is  a  one-to-one  differentiable  map  from  d-tuples  of  image  features  to  the  space  of 
invariants  and  that  the  Jacobian  of  the  transformation  is  ncwhere  singular.  In 
this  case  h  is  necessarily  E-specific  but  ve  are  assuming  more.  Extensions  to 
more  general  cases  are  possible. 

Consider  a  point 

i  =  T  (n,r2,...,rd)  =  h(pAtl,pAt2,...,pAtc,r1,r2,...,r(i). 
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Using  the  stochastic  independence  of  the  XJs  the  probabili^  density  function  of 
Y  evaluated  at  ^  can  be  written  as 

n  /x  (r0 

/y  (0  =  I  J  ;~X - TT  • 

|  Jr  (ri,r2, . . . ,  rd)  \ 

Here  Jr  is  the  Jacobian  matrix  of  the  transformation  T  expressed  as  a  function 
of  ri,  r2, .  .  .  ,  (see  Eqn.  7.5)  and /x  is  the  probability  density  function  for  the 
set  S'  of  image  features.  A  similar  expression  holds  for  fy\n  (£)• 

n/xiH  (r*o 

/Y|"  («>  =  |iT‘(rU.....rll)l  ' 

In  this  expression  fx\n  is  the  probability  density  function  of  the  set  S'  of  im¬ 
age  features  under  the  lypothesis  7J  (m,  [ J 1 ,  J 2 ,  •  •  •  ,jc]  ,  B)  whih  will  imply  the 
likely  existence  of  image  points  at  certain  locations  corresponding  to  points  of  the 
model  m. 

In  the  case  where  T  is  not  a  one-to-one  mapping  of  d-tuples  of  features  into 
the  space  of  invariants  then  the  revised  formulas  involve  integrals  on  the  right 
hand  sides  over  submanifolds  of  a  d-fold  feature  space  whered-tuples  of  features 
map  to  the  given  hash  location. 

Typically  ve  will  assume  that  feature  values  are  continuous  and  that  all  pa¬ 
rameter  values  are  equally  likely.  For  example  when  image  features  are  poiitis 
in  a  rectangular  image  ve  will  assume  a  uniform  density  distribution  over  the 
image  domain.  Thus  /x  (r)  is  often  a  constant  over  the  range  of  features  values. 

For  fx\n  ve  typically  have  a  mix  of  the  background  distribution  /x  and  a  set 
of  smeared  delta  masses  at  the  predicted  positions  for  model  points: 

B  n~c 

fx\n  (r)  =  (1  —  P)  /x  (r)  + - ^  9a  (r  -  <Jj) , 

n  —  c 
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where  ga  (•)  is  a  Gaussian  distribution  representing  the  likely  distribution  of  fea¬ 
ture  values  around  the  image  features  that  are  predicted  by  hypothesis  'H 
from  model  m.  Note  that  there  are  n  —  c  predicted  feature  vectors  for  features 
in  S'  from  model  m  representing  the  features  of  the  model  less  the  basis  set 
{fmjl,fmj2i  .  .  .  ,  fmjc}  as  embedded  in  the  image.  The  coefficient  f3  represents  a 
weighting  factor  that  models  the  number  of  features  from  the  model  m  which  are 
expected  to  actually  occur  in  the  image  features  (i.e.  S')  as  compared  to  the  total 
number  of  image  features  in  S' .  For  a  given  image  feature  r  at  most  one  model 
feature  in  model  m  can  be  “close”  to  r.  If  we  denote  the  closest  such  feature  by 
qj(r)  then  sinceg^  typically  falls  off  rapidly  the  expression  for/x|w  (r)  can  be 
simplified: 

fx\H  (r)  ~  (1  -  P)  /x  (r)  +  -^Z~c  ■  9*  (r  -  qj(r))  • 

In  this  case  the  resulting  formula  for  /y|h  will  have  the  form: 

II  l1  -  Z3)  /x  (r,-)  +  '  9a  (rt-  -  qj(r0) 

8  =  1  L _  J 

|  Jr  (ri,r2, . . . ,  rd) 

Qualitatively  ve  see  that  the  product  is  small  unless  all  or  most  of  the  ry  are  near 
model  features  in  the  image.  If  they  are  all  near  model  features  then  the  location 
of  £  is  near  an  entry  cu  wither  =  [m,  \ji,j2,  ■  ■  ■  ,jc,j  (rQ  ,j  (r2) ,  •  •  •  ,j  (rd)]].  Thus 
we  expect  peaks  in  fy\n  (^)  near  entries  having  tags  [m,  [ J i ,  J 2 ,  •  •  •  ,jc]].  There  are 
other  “surfaces”  in  the  space  of  invariants  where  /y|h  will  be  similarly  elevated 
due  to  all  but  one  or  all  but  a  few  of  there’s  falling  close  to  model  features 
qj’s.  These  surfaces  only  appear  when  d  >  1  and  can  most  lifely  be  ignored 
for  most  applications.  Whether  the  surfaces  can  be  useful  for  object  recognition 
as  described  in  the  next  subsection  is  to  our  knowledge  unexplored.  Of  course 


fY\n  (0  ~ 
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this  formula  is  only  an  example  and  differed  applications  will  result  in  different 
models  for  fx\n  and  this  different  formulas  for  /y|h- 

We  may  summarize  our  exploration  of  the  conditional  density  function  fy\n 
by  noting  that  we  have  given  evidence  for  a  decomposition  theorem  which  says 
that  fy\n  may  be  written  as: 


/Y|h(0=  /3i/y(0  +  /32S(0 


+^3X1  •••  12  9  (^-C  ([m,\ji,j2,---,jc,k1,k2,...,kd\]))  . 


k\  £2"  /c^£2" 

Here  S  is  a  function  which  is  large  only  in  the  “surfaces”  described  above  and  is 
generally  neglected  andX  is  the  set  of  indices  of  m’s  model  features  that  have  not 
been  used  in  the  model  basis  chosen  by  7i.  I.e.  X  =  {1,  2, .  .  .  ,  n}  —  {ji,  ,  jc}. 

In  the  third  term  g~ is  a  local  density  function  that  accounts  for  a  smeared  spike 
at  the  location  of  the  entries  u  =  [m,  [ji,  J2,  •  •  •  ,  jc,  &i,  &2,  •  •  •  ,  &d]]  which  is  the  set 
of  all  entries  in  the  hash  table  generated  by  model  m  and  the  basis  [ J 1 ,  J 2 ,  •  •  •  ,jc]- 
We  recall  that  the  location  of  the  entry  is  denoted  by  £  (cu).  The  density  function 
g  will  depend  on  the  hypothesis  'H  and  the  error  model  for  errors  in  the  image 
features.  For  the  case  where  T  is  one-to-one  w  have  that 

d 


g{T  (n,r2,...,rd)) 


n g-  (r«-  -  Hffro) 

8  =  1 _ 

Jt{  ri,r2,  ...,rd) 


We  further  note  in  passing  that  when  T  is  one-to-one  and  smooth  as  ve  have 
assumed  here  it  is  possible  to  translate  the  densi^  function  measurements  to 
feature  space  (as  opposed  to  space  of  invariants)  and  this  measure  the  log-density 
ratio 

(  fy\n  (0 

V  fv  (0 
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in  terms  of  a  sum  of  deviations  (measured  relative  to  ga)  in  feature  space  over  the 
d-tuple  of  features  .  .  .  p^.  It  is  this  log-density  ratio  that  we  will  use  for 

the  Bayesian  geometric  hashing  algorithm. 

Note  that  the  probability  density  function  /y  can  potentially  depend  on  the 
image  basis  B  =  {pMl ,  p^2 ,  •  •  •  ,  pMc}  due  to  the  |  Jq-  |  term  in  the  denominator 
of  the  defining  equation.  Alternatively  it  is  possible  to  use  the  same  densi^ 
function  /y  independent  of  the  basis  B\  more  specifically  ve  can  compute  either 
empirically  or  analytically  the  average  density  function  over  all  possible  basis 
selections  (see  section  7.10).  In  some  cases  use  of  a  basis-dependeit  probability 
density  function  actually  simplifies  the  formulas  (see  section  7.9). 


7.6  Bayesian  Geometric  Hashing 


We  may  now  piece  the  various  components  together.  Using  Eqn.  7.4  we  have  that 

log (Pr (77  |  T))  =  log K  +  log(Pr(77))  +  ^...^log  ( ^£JHj^Vl'V2'-'Vd 

Recall  that  T  is  the  set  of  hashes  that  arise  from  the  image  features  S1  and 
that  V2  Ud  are  the  hash  locations  of  those  hashes.  Since  we  are  typically  only 
concerned  with  the  relative  strength  of  the  probability  of  a  given  hypothesis  in 
order  to  find  the  top  few  hypotheses  with  the  maximum  support  and  since  w 
will  typically  assume  that  all  hypotheses  are  equally  likely  w  may  neglect  the 
first  two  terms  which  are  independent  of  77  and  simply  compute  the  sum  in  order 
to  obtain  total  support  weights  Z  (77)  for  each  hypothesis.  We  thus  have 


z(W)  =  E-E‘og 

”1  Vd. 


/yI'H  (^ZAl  j 


/y  C 


(7.6) 
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As  written  the  formula  is  not  of  much  help  since  the  computational  cost 
will  be  immense.  The  formula  says  that  for  every  probe  using  a  fixed  basis  B  of 
image  features  ve  compute  hash  locations  using  d-tuples  of  image  features  and 
with  each  hash  location  incremeit  accumulators  for  every  hypothesis.  Recall 
that  'H  =  7d(m,  [ J 1 ,  J 2 ,  •  •  •  ,jc]  ,&)•  Since  there  are  0(Mnc )  hypotheses  and  ve 
may  have  to  perform  O  ( Sc )  probes  ve  achieve  no  speedup  over  classical  search 
methods. 

However  if  ve  make  use  of  our  “decomposition  theorem”  for  /y|h  ignoring 
the  surface  masses  ve  then  obtain  the  following  expression  for  Z  (XL): 


(  •••  ^2  9  -C 


Vl  V<d 


@1 + A 


fvU 


*V1  ,v2,...,Vd 


Recall  that  X  is  the  set  of  available  indices  for  feature  points  in  the  computation 
of  an  entry  corresponding  to  model  m  and  basis  [ji,  j2,  •  •  •  ,jc]- 

Consider  a  fixed  7i.  For  each  hash  £„  „  „  there  is  a  coitribution  to  Z  (7i). 

If  the  hash  does  not  fall  near  any  of  the  entries  with  tag  equal  to  [m,  [ J 1 ,  J 2 ,  •  •  •  ,  jc]] 
(i.e.  a  tag  associated  with  the  lypothesis  XL)  then  all^~ terms  may  be  ignored 
and  the  contribution  is  log  (/A).  If  the  hash  falls  near  one  such  entry  then  ly 
assumption  it  will  fall  near  only  one  such  entry.  This  is  because  for  a  given 
model/basis  \m,  [ji,  j2,  •  •  •  ,  jc]]  the  corresponding  eitries  are  separated  and  so 
that  a  single  hash  can  be  located  in  the  vicinity  of  at  most  one  such  entry  (al¬ 
though  there  may  be  multiple  entries  in  the  neighborhood  ea<h  with  a  different 
tag  and  this  contributing  to  different  hypotheses).  Suppose  that  for  the  tag 
\m,  [ji,  j2,  •  •  •  ,  jc]]  the  eitry  u  is  in  the  neighborhood  of  the  hash  ^  V2  .  Then 
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the  contribution  to  'H  will  be 


log 


@  1  +  @  3 


We  see  that  the  criterion  for  a  hash  to  be  near  an  entry  is  that  the  value  of  the 
second  term  of  the  log  is  significant  relative  to  the  first  term.  The  criterion  may 
be  reinterpreted  in  terms  of  a  threshold  for  g  for  particular  applications. 

The  weighted  voting  may  thus  be  organized  as  follows.  Initially  evry  entry 
has  a  weight  z  (cu)  equal  to  zero.  For  each  hash  ^  V2  Ud  ve  use  a  k  —  D  (k- 
Dimensional)  tree  or  similar  data  structure  to  access  entries  in  the  neighborhood. 
The  weight 

z(u)  =  log  (^+@3  ^  —  log  (/?!) 

=  los(1  +  f 

is  computed  and  applied  tow.  If  the  entry  already  has  a  nonzero  weight  assigned 
to  it  from  a  previous  hash  then  ve  have  two  distinct  collections  of  n  scene  fea¬ 
tures  vying  for  correspondence  to  a  single  group  ofn  features  in  a  model.  Since 
only  one  of  the  two  matchings  can  be  correct  and  ve  should  not  allow  both  to  in¬ 
dependently  support  the  matching  hypothesis  'H  ve  assign  to  z  (cu)  the  maximum 
of  the  values  computed  from  the  competing  hashes. 

After  all  hashes  are  processed  ve  compute  total  support  values  Z(7i)  by 
accumulating  weights  for  entries  with  common  tags.  However  to  accouit  for  the 
hashes  that  do  not  fall  near  entries  with  the  given  tag  ea<h  bucket  begins  with 
a  bias  term  equal  to  the  number  of  hashes  times  log  (/A).  Assuming  there  are 
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(S  —  c)d  hashes  w  have 


Z(H)  =  (S  -  c)d  log  (/3i)  +  Z  •  •  •  Z  Ui’te,  ■  ■  ■  ,Jc,kl7k2, . .  .,kd]}) . 

k\  £2"  /c^£2" 

We  see  that  the  net  effect  is  that  each  hash  implicitly  contributes  log  (/3i)  through 
the  bias  term  or  it  coitributes  the  same  implicit  amount  together  with  the  explicit 
z  (cu)  through  a  nearby  entry.  Since  the  z  (cu)  contains  a  term  that  cancels  the 
log(/3i)  amount  the  hash  coitributes  the  correct  quantity 


log 


/?!  +  (3  3 


/y(0  )) 


In  practice  since  a  constait  amount  added  to  each  hypothesis  is  irrelevant 
(we  only  seek  the  top  few  hypothesis)  w  may  omit  the  bias  quantity  in  the 
computation  and  simply  set 


Z('H)=  Z  •••  Z  z([m,\jiJ2,---,jc,k1,k2,...,kd\]). 

Quite  simply  all  eitries  with  the  same  tag  combine  their  z  (cu)  values  to  yield  a 
degree  of  support  for  the  corresponding  hypothesis. 


7. 7  Exact  versus  Approximate  Matching 

The  hypothesis  Ti(m7  [ J i ,  J 2 ,  •  •  •  ,jc]  ,&)  represents  the  statement  that  model  m  is 
present  in  the  image  with  the  matching  defined  by  placing  the  basis  set  of  model 
features  fmjl  fmj2  .  .  .  fmjc  respectively  in  correspondence  with  the  features  in  B. 
In  the  way  we  have  formulated  the  hypothesis  the  transformation  is  uniquely  and 
precisely  defined  by  the  pairing  of  the  features.  There  are  two  reasons  why  the 
hypothesis  might  be  rejected. 


140 


•  First  it  my  not  be  true  that  the  matching  is  the  correct  one  and  that B 
is  not  part  of  an  instance  of  model  m  with  the  appropriate  association  of 
model  features  as  given  in  the  hypothesis; 

•  Second  there  migh  be  too  much  noise  in  the  image  features  represented  by 
the  chosen  B  so  that  the  transformation  defined  ly  pairing  model  points 
fmjl  fmj2  •  •  •  fmjc  respectively  with  B  in  the  image  does  not  bring  suffi¬ 
ciently  closely  into  correspondence  features  in  m  and  S'.  This  can  happen 
even  though  there  does  exist  a  transformation  which  approximately  brings 
into  correspondence  model  points  fmjl  fmj2  .  .  .  fmjc  and  B  together  with 
other  features  of  model  m  with  features  in  S'  establishing  a  wlid  interpre¬ 
tation  of  model  m.  In  this  case  the  transformation  will  carry  the  model 
basis  to  features  that  he  in  the  neighborhood  of  the  respective  features  of  B. 

If  we  use  the  probe-based  search  strategy  of  locating  models  in  the  image 
then  it  can  happen  that  we  chose  a  collection  of  features  B  as  a  basis  and  that 
this  basis  belongs  to  a  proper  interpretation  of  a  model  but  that  the  model  will 
not  be  recognized  due  to  error  in  features  in B.  We  identify  three  ways  of  dealing 
with  this  situation. 

First  the  problem  my  be  ignored.  Then  some  wlid  hypotheses  may  be 
rejected.  However  the  failure  of  a  givn  probe  with  one  basis  B  may  be  rescued 
by  a  subsequent  probe  with  a  different  basis  B'.  We  must  analyze  the  probability 
that  a  given  valid  basis  will  yield  a  rejection  of  the  corresponding  hypothesis 
and  then  the  probability  that  most  or  all  bases  on  a  given  embedded  model  will 
lead  to  rejections.  We  do  not  conduct  such  an  analysis  here  but  w  note  that 
generally  the  best  compromise  matching  will  place  some  subset  of  model  features 


141 


(generally  at  least  a  basis  set)  close  to  corresponding  image  features. 

A  second  alternative  is  to  modify  the  hypotheses  so  that  eah  hypothesis 
represents  an  approximate  match  between  the  chosen  basis  and  a  model/basis 
combination.  That  is  w  can  modify  the  hypothesis  to  say  that  hi  implies  an  ap¬ 
proximate  matching.  Figure  7.5  depicts  the  difference  between  exact-matching  hy¬ 
potheses  and  approximate-matching  hypotheses.  In  the  computation  of  fx|K  the 


Exact-Matching  Hypothesis  Approximate-Matching  Hypothesis 


Figure  7.5  An  exact-matching  hypothesis  as  compared  to  an  ap¬ 
proximate-matching  hypothesis.  Note  that  in  the  case  of  an  approxi¬ 
mate-matching  hypothesis,  there  is  a  greater  range  of  uncertainty  in 
the  predicted  image  features  that  arise  as  a  result  of  the  remaining 
features  of  the  model. 


probability  density  distribution  of  features  predicted  by  the  presence  of  model  m 
under  an  approximate  matching  of  basis  fmjl  fmj9  .  .  .  fmjc  with  image  fea¬ 
tures  B  nust  take  into  account  potential  noise  in  the  features  predicted  to  be 
located  in  the  image  as  • well  as  the  possible  noise  in  the  basis  features  B.  This 
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is  in  fact  the  analysis  that  we  carried  out  in  chapter  6.  The  result  is  that  in 
fy| n  whih  gives  the  distribution  function  in  the  space  of  invariants  based  on 
h  (pw  ,  pM2,  .  .  .  ,  p |Uc,  Xi,  X2, .  .  .  ,  Xd)  the  spiles  due  to  the  predicted  locations  of 
features  will  be  more  spread  out.  In  terms  of  the  Eqn.  7.6  this  means  that  the 
local  density  function  g  used  to  construct  spikes  in  the  space  of  invariants  will 
have  a  larger  support.  Approximate-matching  hypotheses  leave  a  number  of  free 
continuous  parameters  to  be  determined.  When  there  are  continuous  parameters 
that  cannot  be  constrained  by  a  finite  collection  of  possibilities  this  is  when  fil¬ 
tering  strategies  or  Hough  transform  methods  become  more  appropriate.  We  are 
thus  led  to  suggest  a  third  alternative  for  dealing  with  the  possibility  of  rejection 
of  valid  exact-matching  hypotheses. 

In  the  third  alternative  the  lypotheses  represent  approximate  transforma¬ 
tions  and  the  densi^  functions  reflect  the  expanded  uncertainty  due  to  potential 
error  in  the  basis  features.  However  instead  of  simply  wting  for  the  probability 
of  a  hypothesis  based  on  a  nearby  entry  in  the  space  of  invariants  ve  can  instead 
vote  for  ranges  of  parameter  values  that  are  consistent  with  the  observed  posi¬ 
tions  of  the  feature  values.  The  search  may  be  organized  differently  than  probes 
with  basis  sets.  In  general  if  a  hashh  (p^ ,  pV2 , .  .  .  ,  \)Vn)  lands  near  an  entry 
at  location  h  (fmjl ,  fmj2, .  .  .  ,  fmjn)  w  may  vote  in  a  Hough  space  for  a  range 
of  parameters  that  reasonably  map  the  model  features  to  corresponding  image 
features  and  store  the  table  with  a  lypothesis  concerning  the  particular  model. 
This  is  for  example  the  approdc  of  Gueziec  [40]  for  matching  3 D  curves.  We 
might  instead  have  a  separate  Hough  table  for  each  approximate  matching  and 
use  votes  for  local  parameter  values  to  establish  probable  deformations  of  the 
basis  features  from  the  observed  values.  Methods  combining  geometric  hashing 
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with  parameter  voting  (Hough  transforms)  are  only  beginning  to  be  studied  and 
extensions  of  the  Bayesian  interpretation  to  the  case  where  parameter  spaces  and 
Hough  transforms  are  involved  is  not  considered  here. 

Experiments  with  both  random  perturbations  of  synthetic  data  and  an  actual 
database  of  real-world  objects  indicated  that  the  use  of  approximate  hypotheses 
leads  to  better  results. 

7.8  False  Alarm  Rates 

We  may  now  comment  on  false  alarm  rates  under  varying  scenarios.  We  first 
suppose  that  the  hash  function  is  iF-specific  and  that  exact-mathing  hypotheses 
are  used.  Further  to  simplify  things  let  us  assume  that  the  number  of  excess 
features  in  the  hash  function  d  is  one. 

If  neighborhood  voting  is  used  and  if  a  lypothesis  receives  n'  votes  during 
a  probe  then  w  can  be  certain  that  when  the  model  basis  of  the  hypothesis  is 
put  into  correspondence  with  the  basis  of  scene  features  from  the  probe  then 
exactly  n'  features  fall  within  a  radius  as  defined  by  the  neighborhood  radius  of 
the  voting  scheme.  Ideally  the  radius  for  the  neiglborhood  voting  scheme  in 
the  space  of  invariants  has  been  adjusted  for  the  probe  according  to  thehosen 
basis  set  so  that  the  guaraitee  is  that  n'  features  fall  within  a  one-  or  two-  (or  a 
predetermined  multiple  of)  sigma  distance  of  the  predicted  values  for  the  feature. 

If  instead  weighted  voting  is  used  then  because  there  is  a  maximum  weighted 
vote  u>(£,  £)  that  any  given  hash  value  can  give  to  a  hypothesis  ve  can  give  a 
guarantee  of  a  minimum  number  of  features  from  the  corresponding  model  falling 
within  a  given  radius  of  the  predicted  values  if  the  total  v>te  for  a  hypothesis 
is  Z  (7d).  Further  if  the  appropriate  B<yesian  formula  is  used  Z  (7 i.)  will  be  an 
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accurate  measure  related  to  the  true  a  posteriori  probability  of  the  hypothesis.  If 
the  hash  function  is  iF'-specific  wherelF'  is  a  transformation  class  that  strictly 
contains  T  then  all  of  the  same  commeits  apply  but  the  postprocessing  step  will 
have  to  disambiguate  between  models  that  match  with  an  T  transformation  and 
those  that  require  the  generality  of  an  T'  transformation. 

In  this  sense  with  exact-mathing  hypotheses  and  an  iF-specffic  hash  func¬ 
tion  there  can  be  no  false  alarms.  The  situation  is  more  complicated  with  hash 
functions  involving  d-tuples  of  features  appended  to  basis  sets  withd  >  1  but 
similar  guarantees  can  be  given  as  a  result  of  the  computation  of  a  support  value 
for  each  hypothesis.  These  assure  us  that  the  measure  of  a  certain  value  for  Z  (7d) 
will  yield  a  valid  interpretation;  a  simpler  argument  suffices  to  show  that  the  ex¬ 
istence  of  a  valid  interpretation  under  an  exact-matching  hypothesis  will  yield  a 
guaranteed  level  of  support  evn  if  the  non-basis  features  match  only  within  s<y 
one  sigma  (on  the  average). 

As  indicated  above  the  use  of  the  exact-mathing  hypotheses  can  lead  to  a 
hypothesis  being  rejected  even  though  an  approximate  matching  of  the  basis  of 
the  model  to  the  scene  basis  can  be  extended  to  a  valid  interpretation. 

If  we  use  approximate-matching  hypotheses  and  neiglborhood  voting  then 
the  guarantee  that  we  can  give  for  a  total  vote  of  n'  is  much  less.  Each  of  the 
votes  in  the  total  count  of  n'  indicates  that  there  exists  a  transformation  taking 
the  model  basis  together  with  the  (d  =  1)  extra  feature  of  the  model  to  within 
one-sigma  (or  an  appropriate  radius)  of  existing  features  in  the  scene.  However  in 
the  absence  of  a  Hough  space  to  measure  deviations  of  the  basis  matching  there 
is  no  guarantee  that  the  n'  votes  correspond  to  the  same  transformation.  We 
can  only  be  guaranteed  that  there  are  n'  different  transformations  ea<h  similar 
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to  the  others  in  the  sense  that  the  basis  set  of  the  model  is  mapped  to  within  a 
specified  radius  of  the  scene  basis  features  su<h  that  remaining  model  features 
are  individually  approximated  by  scene  features. 

If  weighted  voting  is  used  in  conjunction  with  approximate-matching  hypothe¬ 
ses  then  w  once  again  obtain  a  support  value  which  can  be  used  to  determine  a 
minimum  number  of  existing  points  such  that  an  approximating  transformation 
may  be  found  for  each  su<h  that  the  basis  points  of  the  model  are  mapped  to 
the  vicinity  of  the  scene  basis.  However  ve  are  not  guaranteed  that  the  mul¬ 
tiple  transformations  are  the  same.  Indeed  in  terms  of  the  B<yesian  analysis 
the  evidence  is  no  longer  independent  due  to  the  approdmate-matching  hypoth¬ 
esis.  The  difficulty  is  that  the  error  in  the  basis  features  can  be  deduced  from 
a  sufficient  number  of  corroborating  hashes  involving  features  from  the  model 
embedded  in  the  scene.  The  net  effect  of  the  use  of  Bayesian-based  geometric 
hashing  with  approximate  matching  hypotheses  is  the  same  as  exact  matching 
hypotheses  but  with  larger  error  bounds  on  the  unpaired  features.  Due  to  the 
lack  of  independence  ve  cannot  say  that  the  values  give  a  precise  measure  of  the 
relative  a  posteriori  probabilities.  We  will  nonetheless  use  the  computations  im¬ 
plied  by  the  exact-matching  hypotheses  for  the  approximate-matching  case  using 
the  larger  error  bounds  in  order  to  be  able  to  detect  models  when  the  basis  set 
matches  only  approximately  justifying  the  weighted- voting  formulas  empirically. 

7.9  The  Formulas:  Exact  Matching 

We  now  apply  the  abstract  theory.  We  are  concerned  with  the  matching  of  pat¬ 
terns  of  point  features  in  2D.  Thus  a  poiit  feature  p  is  specified  by  a  coordinate 
pair  p  =  (x}y).  In  this  section  ve  treat  the  case  of  exact-matching  hypotheses 
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as  a  preparation  for  the  next  section  where  the  approcimate-matching  formulas 


are  developed. 

We  will  consider  two  classes  of  transformations:  similarity  and  affine.  Respec¬ 
tively  a  basis  tuple  is  defined  ly  two  or  three  poiits.  The  hash  functions  which 
in  each  case  operate  on  a  basis  set  plus  one  feature  are  for  similarly  invariance 


(c  =  2): 


hs(pi,p2,p3)  = 


x2  -  X!  -y 2  +  y i  u 


y2  -  y i  x2-  X! 


X3  -  (xi  +  x2)/2 
y3  -  (yi  +  y2)/2 


and  for  affine  iwariance  (c  =  3): 


Mpi,p2,P3,p4)  = 


X2  —  X\  x3  —  X1  u 


x4  —  (x1  +  x2  +  x3)/3 


y  y2  -  yi  y3  -  yi  )  y  v  )  \  y4  -  (yi  +  y2  +  y3)/ 3  J 

Clearly  ve  have  used  the  notation  p8-  =  (it,  j/,-)  and  ve  assume  that  the  basis 
arguments  (the  first  c  arguments  to  the  hash  function)  are  non-collinear.  In  both 
cases  the  ceiter  of  the  coordinate  system  is  defined  as  the  barycenter  of  the  basis 
tuple  (the  first  c  vectors  in  the  arguments  to  hc).  As  we  saw  in  chapter  4  this 
choice  maximizes  the  number  of  available  symmetries  and  has  certain  adwntages 
in  terms  of  avoiding  compounding  of  errors. 

Each  hash  function  has  the  form 


h  (pi,  p2,  •  •  • ,  Pc+i)  = 


=  A”1  b 
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where  the  A  and  b  appear  in  Eqns.  6.7  and  6.10  respectively.7 

It  can  be  shown  that  the  functions  hs  andha  are  respectivly  similarity- 
specific  and  affine-specific.  W  recall  from  section  7.1  that  this  means  that  the 
fibers  of  the  hash  functions  are  orbits  of  the  respective  transformation  classes  so 
that  if  h  (pi,  p2,  •  •  •  ,  Pc+i)  =  (pi,p'2,  •  •  •  ,  Pc+i)  then  the  features  pi,p2,  •  •  •  ,  pc+i 
can  be  transformed  to  the  features  p'l5  p'2, .  .  .  ,  p'c+1  by  a  transformation  in  the 
respective  class.  Thus  if  a  hash  ffiue  equals  (or  is  very  close  to)  a  prerecorded 
entry  then  w  know  that  there  is  a  transformation  that  brings  the  collection  of 
features  from  the  model  close  to  the  set  of  features  from  the  image. 

The  two  distribution  functions  of  an  entry  due  to  positional  noise  in  the  image 
point  features  can  be  derived  very  easily.  Recall  that  we  make  the  assumption 
of  exact-matching  hypotheses.  Consequently  in  ea<h  case  w  assume  that  the 
first  c  arguments  are  fixed  and  that  the  final  poirt  is  perturbed  by  additive 
Gaussian  positional  noise  with  zero  mean  and  covariance  (q  °).  Thus  Pi  p2 
pc  are  fixed  andX  is  a  random  vector  with  mean  pc+i  and  covariance  (q  °). 
Let  us  consider  the  random  vector  h  (pi,  p2, .  .  .  ,  X).  Since  h  is  linear  in  pc+i  we 
conclude  that  h  (pi,  p2, .  .  .  ,  X)  is  also  a  Gaussian  with  mean  h  (pi,  p2, .  .  .  ,  pc+i) 
and  covariance 

E  =  a2  [AM]  _1  . 

More  specifically  for  the  case  of  similarly  we  have  that 


For  the  case  of  affine  transformation  w  define  =  p2  —  Pi  and62  =  p3  —  Pi- 
7For  the  special  case  a  =  j3  =  7  =  1/3. 
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Then  the  covariance  matrix  is 


Next  ve  compute  the  distribution  functions  fy\n  (0  and  /y  (£)•  We  use 
Y  =  h  (pi,  p2, .  .  .  ,  X)  whereX  is  now  a  random  vector  that  takes  on  image 
features  (throughout  the  entire  image).  For  the  unconditioned  case  (fy)  ve 
assume  that  X  takes  on  with  equal  probability  any  of  S  —  c  features  located 
uniformly  throughout  the  image.  For  the  conditioned  case  (/y|h)  then  ve  know 
that  there  are  n  —  c  positions  where  features  are  likely  to  occur  due  to  the 
completion  of  the  interpretation  represented  by  the  hypothesis  7i.  If  we  assume 
that  the  rate  of  non-obscuration  of  model  points  in  the  image  is  p  then  there 
are  (n  —  c )  known  positions  with  total  densi^  p(n  —  c)  which  will  correspond 
to  Gaussian  spikes  in  the  space  of  invariants  and  the  remaining,?  —  c  —  p(n  —  c ) 
total  density  of  points  will  be  evenly  distributed  throughout  the  image.  Because 
the  number  of  excess  features  d  is  one  there  is  no  “surface”  densi^  support. 

Let  1Z  denote  the  image  region  (usually  rectangular).  The  support  of  fy  (£) 
in  the  space  of  invariants  is  simply  h  (pMl ,  p^2 , .  .  .  ,  p|Uc,  1Z)  meaning  the  hash 
function  with  the  fixed  basis  applied  to  the  entire  image  domain  as  the  final 
parameter.  Since  fy  is  assumed  constant  over  its  support  and  irtegrates  to 
S  —  c  ve  have  that 

^ Y  ^  |  det  (A-1)  |  • Area(JZ ) 

where  the  function  Area(-)  returns  the  area  of  its  argument. 

For  the  conditioned  function  ve  have  Gaussian  spikes  at  each  entry  Uj  located 
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at 


C j  C  (Uj)  h  (f)T i,ji  1  f mj2  ■>•••■>  f 'm,jci  f 'm,j )  ? 


where  cjj  is  the  entry  [m,  [ji,  j2, .  .  .  ,  jc,  j]]  and  it  is  understood  thatm,  ji,  j2,  ■  ■  ■  ,  jc 
are  fixed  and  j  runs  through  the  indices  1,2, ...  ,n  leafing  out  the  basis  indices 
{ji,  j2,  •  •  •  ,jc}-  The  entry  is  tagged  with  the  information  [ m ,  [ J 1 ,  J 2 ,  •  •  •  ,  jc]]  which 
is  the  same  information  that  is  fixed  by  the  hypothesis  77  (m,  [ji,  j2, .  .  .  ,jc]  B).  We 


thus  have 


f  S-c-p(n-c)  ^ _ 

Y|H  J  I  det(A“1)  |  -Area(n)  y 2vr , 


det  (E) 


*-c 


Note  that  77  is  considered  fixed  and  then’s  depend  on  77. 

Next  w  compute  the  log-probability  ratio  for  each  hypothesis  using  Eqn.  7.4. 
We  recall  that  the  total  support  for  a  hypothesis  is  denoted  by  Z  (7 7).  We  will 
assume  that  each  a  priori  probability  Pr  (H)  is  constait  and  will  omit  additiv 
constants  (such  as  the  log  K  term). 

Using  Eqn.  7.6  and  the  formulas  for  fy  and  fy\n  ^  obtain  after  some  sim¬ 
plifications 


where  {^,£2,  •  •  •  5^5-c}  are  the  hash  locations  T  of  image  point  features  using 
basis  B. 

We  wish  to  exchange  the  order  of  summations.  This  is  possible  because  for  a 
given  hypothesis  77  for  a  fixed  image  poirt  and  thus  a  fixed  hash  location  at 
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most  one  model  hash  point  £•  can  he  near  £k.  We  have  assumed  that  the  model 
points  are  distinct  and  separated  and  this  for  a  fixed  model/basis  the  resulting 
hashes  £l5  £2, .  .  .  ,  C n-c  are  separated  so  that  at  most  one  eitry  may  he  near  the 
given  point  £  •.  Consequently  all  but  one  of  the  exponehial  terms  is  essentially 
zero  for  aiy  given  k.  Thus  we  set 

p(n-c)  p-  Area(K)  (  (tk  ~  Cj)  S  1  ($*  “  Cj)  \ 

1  “  S-c  +  2tt  (S  —  c)  ■  a2  6XP  y  2  j 

0 


zk,j  (K)  =  { 


log 


according  to  whether  \\£k  —  £-||  is  small  (first  line)  or  large  (whence  Zk,j  is  set  to 
zero).  We  then  have  approximately 

n—c S—c 

Z(W)«EE*W(WI  ■ 

j  =  1  k=l 

Other  definitions  for  Zk,j  are  possible  and  more  elegant  but  there  is  a  poteitial 
advantage  in  assigning  zero  values  for  many  of  the  variables. 

Finally  ve  describe  the  Bayesian  geometric  hashing  algorithm  for  each  of  the 
two  transformations.  The  space  of  invariants  contains  entries  with  model/basis 
tags  of  the  form  [m,  [ji,  j2,  •  •  •  ,ic]]-  The  entries  are  located  at  positions  £  = 
h(fmjl,fmj2, .  .  .  ,fmjc,fmj);  fmj  is  an  arbitrary  point  in  the  model  not  included 
in  the  first  c  arguments  (the  basis  tuple).  During  the  recognition  phase  ve  chose 
a  basis  of  point  features  pMl ,  p|U2 , .  .  .  ,  p|Uc  and  perform  a  probe  during  which  we 
will  apply  votes  to  entries.  Initially  a  v>te  of  zero  is  stored  for  each  entry.  We 
compute  hash  locations  £  =  h  (pMl ,  p^2 , .  .  .  ,  p|Uc,  p/;)  for  point  features  p/;  £  S' 
i.e.  using  the  remaining  poiit  features  in  the  image.  For  each  such  hash  £  ve 
locate  all  entries  at  nearby  positions  £.  For  each  such  entry  ve  compute  the  value 
w  (CO  f°r  the  appropriatec.  For  similarity  invariance  (:  =  2)  ve  have 
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^a(C,0  =  k)g  1  ~  P  o3  1 


.  p  ■  Area  (TV) 

+  2T(S-3)<WP 


-Il(4-C,)'«.+  «,-C,  )•« 


We  might  note  that  in  both  cases  the  argument  to  the  exponential  function  is 
one-half  the  square  of  the  distance  measured  relativ  to  the  standard  deviation 
a  betveen  the  predicted  hash  location  of  the  corresponding  point  in  the  image 
and  the  observed  location  of  the  point. 

Having  computed  the  weight  w( £,  £)  f°r  the  entry  u  w  update  the  z  value 
at  to  by  replacing  z  (cu)  with  the  maximum  of  the  current  value  and  the  weight. 
This  is  done  for  every  entry  u  in  the  neighborhood  of  the  hash  ^  and  is  repeated 
for  every  hash  from  the  scene.  The  probe  is  concluded  by  summing  the  z  values 
associated  with  entries  having  equal  tags: 


Z{H)  =  [ji,J2,  •  •  •  ,jc,j]])  • 

3 

Here  7 i,  =  [m,  [ J i ,  J 2 ,  •  •  •  5  jc]]-  The  hypotheses  with  the  top  few  Z  (7 i)  values  are 
candidates  for  interpretations  using  the  basis  B. 


Since  the  maximum  value  that  a  hash  can  contribute  to  a  hypothesis  is 

(  p(n  —  c)  p -Area  (71)  \ 

we  know  that  if  a  hypothesis  with  a  total  vote  of  Z  (7i)  must  have  at  least 


Z  (7i)  / z0  corroborating  points  in  the  neighborhood  of  the  positions  predicted 
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by  th#  hypothesized  match.  A  summary  of  the  2D  point  pattern  recognition 
algorithm  is  given  in  Figure  7.6. 

Finally  ve  briefly  consider  the  size  of  the  neighborhood  in  which  nearby  hashes 
should  be  accessed.  Given  a  hash  at  location  ^  the  v>te  applied  to  nearby  entries 
is  w  (£,  £)  and  zero  for  eitries  that  are  far  away  (the  vote  is  applied  in  a  maximum 


IMAGE 


N-tuple 


choose  image  basis 


compute  hashes  using 
other  image  points 


z-values 

(initially  all  equal  to  0) 


Indexed  by  entries  co 


Space  of 
invariants 

access  nearby  entries,  and 
compute  w(^) 
for  each  such  entry 


replace  z(co)  by  max(z(co),  w(£,  £)) 


sum  z-values  among  entries  with 
the  same  model/basis  tags  to 
obtain  support  values  for  the 
various  model/basis  hypotheses 


Figure  7.6  The  steps  for  a  probe  with  a  single  basis  set  during  the 
recognition  phase  of  the  Bayesian  geometric  hashing  algorithm  for 
point  pattern  recognition. 


way).  Since  the  function  te(^,C)  decays  rapidly  as  the  separation  increases  the 
neighborhood  can  be  as  large  as  we  like.  If  a  vote  is  applied  to  two  entries  with 
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the  same  tag  then  the  approximation  to  Eqn.  7.9  is  violated  but  due  to  the 
separation  of  hashes  with  equal  tags  at  least  one  of  those  v>tes  will  be  close  to 
zero  whih  has  no  consequence  to  the  vote  at  that  site.  However  it  is  desirable 
to  use  a  small  neighborhood  to  reduce  the  mmber  of  entries  to  which  votes  must 
be  applied.  It  suffices  to  find  a  distance  at  which  the  weight  is  guaranteed  to  be 
small  i.e.  a  small  fraction  of  its  maximum  value  z0  relativ  to  the  n  —  c  entries 
that  will  be  combined  to  form  the  total  vote  Z  (77). 

7.10  The  Formulas:  Approximate  Matching 

In  this  section  we  present  the  formulas  for  the  case  of  weighted  voting  and 
approximate-matching  hypotheses. 

We  modify  the  hypotheses  so  that  each  hypothesis  represents  an  approximate- 
matching  between  the  selected  basis  and  a  model/basis  combination.  77  will  now 
imply  an  approximate  matching.  As  in  the  exact  matching  case  tvo  classes  of 
transformations  will  be  considered  namely  similarly  and  affine. 

The  hash  functions  again  operate  on  a  basis  set  plus  one  more  feature  and 
have  the  form 

/  \ 

u 

h(pi,p2,...,pc+i)  =  =  A-1b 

V  u 

where  the  A  and  b  appear  in  Eqns.  6.7  and  6.10  respectively.8 

Since  we  make  the  assumption  of  approximate  mathing  hypotheses  su<h  an 
analysis  must  take  into  account  potential  noise  in  the  basis  features  B.  And 
this  was  the  study  in  chapter  6  where  ve  examined  how  the  error  propagates 
if  we  assume  that  the  positions  of  all  the  point  features  of  the  (c  +  l)-tuple  are 
8For  the  special  case  a  =  (3  =  7  =  1/3. 
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disturbed  by  additive  Gaussian  noise.  The  conclusion  of  that  analysis  was  that 
h  (Pl,  p2, . . . ,  X)  is  also  a  Gaussian  with  mean  h  (pi,  p2, .  .  .  ,  pc+i)  and  covariance 


matrices 


(4  ||  (it,  v)  ||2  +3)  cr2  10 
2  ||  P2  —  Pi  ||2  n  i 


5 


and 


_(4(u2  +  u2  +  uv)  +  8/3)  a2  ||  (Ps  —  Pi)  ||2  -(p2  -  Pi)  (p3  -  Pi)4 

2  |(P2-Pi)(P3-Pi)±  |2  ^_(p2_p1)(p3_p1)t  1 1  (P2  Pl )  1 1 2  ) 

for  the  similarity  and  affine  cases  respectively. 

We  next  compute  the  distribution  functions  /y|h  (£)  and  /y  (£).  We  use 
Y  =  h  (pi,  p2, .  .  .  ,  X)  whereX  is  now  a  random  vector  that  takes  on  image 
features  (throughout  the  entire  image).  As  before  for  the  unconditioned  case 
(/y)  we  assume  that  X  takes  on  with  equal  probability  any  of  S  —  c  features 
located  uniformly  throughout  the  image.  For  the  conditioned  case  (/y|w)  w 
know  that  there  are  n  —  c  positions  where  features  are  likely  to  occur  due  to  the 
completion  of  the  interpretation  represented  by  the  hypothesis  7i.  If  p  denotes 
the  rate  of  non-obscuration  of  model  point  features  in  the  image  then  there  are 
(n  —  c )  known  positions  with  total  densi^  p(n  —  c)  which  will  correspond  to 
Gaussian  spikes  in  the  space  of  invariants  and  the  remaining,?  —  c  —  p(n  —  c ) 
total  density  of  points  will  be  evenly  distributed  throughout  the  image.  Because 
the  number  of  excess  features  d  is  one  there  is  no  “surface”  densi^  support. 

Unlike  the  exact-matching  case  ve  use  the  same  probability  density  func¬ 
tion  for  the  invariants  independeit  of  the  selected  basis  B.  In  chapter  4  ve 
determined  analytically  the  expected  probability  density  function  fe  (£  )  for  in¬ 
variants  ever  all  possible  basis  selections  for  a  mmber  of  transformation  and 
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feature  distribution  combinations.  Since  fy  integrates  to  S  —  c  we  have  that 


fy(Z)  =  (S-C)-fe(t) 

Alternatively  and  depending  on  the  application  fe  (£)  and  thus  fy  can  be  de¬ 
termined  empirically.  For  the  conditioned  function  ve  have  Gaussian  spikes  at 
each  entry  Uj  located  at 


C j  C  (Uj)  h  (f)T i,ji  1  f mj2  5  •  •  •  5  ^m,jci  fm,j)  ? 


where  Uj  is  the  entry  \m,  [  J  i  ,  J  2 ,  •  •  •  ,  jc,  j]]  and  it  is  understood  thatm,  j\,  j2,  ■  ■  ■  ,jc 
are  fixed  and  j  runs  through  the  indices  1,2, ...  ,n  tearing  out  the  basis  indices 
{ji,  j2,  •  •  •  ,jc}-  The  entry  is  tagged  with  the  information  \m,  [ji,  j2,  ■  ■  ■  ,jc]]  whih 
is  the  same  information  that  is  fixed  by  the  hypothesis  'H  (m,  [ J 1 ,  J 2 ,  •  •  •  ,jc]  &)■  We 
thus  have 


/y|h(0=  (S  —  C  —  p  (n  —  cj)  ■  fe  (£) 

+  ?2,r./|det(S:l)|eXP( 


€  -  C,  S7‘  £-C 


Note  that  'H  is  considered  fixed  and  that  both  Ej  and  £•  depend  on  77. 

We  can  now  compute  the  log-probability  ratio  for  each  approximate  hypothe¬ 
sis  using  Eqn.  7.4.  Once  again  Z  (7 i)  denotes  the  total  support  for  a  hypothesis 
each  Pr(7 i)  is  constant  and  additiv  constants  (i.e.  the  log  K  term)  are  omitted. 


Using  Eqn.  7.6  and  the  formulas  for  fy  and  /y|h  ve  obtain 


156 


where  |£i,£2)  •  •  •  5^5-cj  are  the  hash  locations  T  of  image  point  features  using 
basis  B. 

Working  as  in  section  7.9  ve  can  show  that 


zk,j  (^)=S 


log 


1- 


p(n-c)  p(S-c)  1  r/e  (€)1  1  -J  k  Cj) 

~^  +  2^|det,S),|  eXIV  2 


0 


according  to  whether  (^k  —  Ss  1  (%k  —  is  small  (first  line)  or  large  (whence 
Zk,j  is  set  to  zero).  Approximately  ve  can  write 


n—c S—c 

z(«)  «££-%,(«)  • 

j  =  1  k=l 

The  Bayesian  geometric  hashing  algorithm  for  each  of  the  two  transformations 
and  the  assumption  of  approximate  hypotheses  proceeds  as  before:  the  space  of 
invariants  contains  entries  with  model/basis  tags  of  the  form  [m,  [ji,  j2,  •  •  •  ,  jc]] 
located  at  positions  £  =  h  (fmjl ,  fmj2 , .  .  .  ,  fmjc,  fmj)  where fmj  is  an  arbitrary 
point  in  the  model  not  included  in  the  first  c  arguments  (the  basis  tuple).  During 
the  recognition  phase  ve  chose  a  basis  of  image  point  features  pMl ,  p^2 , .  .  .  ,  p|Uc 
and  perform  a  probe  during  which  we  will  apply  votes  to  entries.  Initially  a 
vote  of  zero  is  stored  for  each  such  entry.  We  compute  hash  locations  ^  = 
h  (pMl ,  pM2, .  .  .  ,  p |Uc,  pfc)  for  point  features  p/;  £  S'  i.e.  using  the  remaining  poitn 
features  in  the  image.  For  each  such  hash  ^  ve  locate  all  entries  at  nearby 
positions  £.  For  each  such  position  ve  compute  the  value  w  (£,  £)  for  the  appro¬ 
priate  c.  Simple  substitution  shows  that  for  similarity  invariance  (:  =  2)  ve  have 
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For  affine  invariance  (c  =  3)  ve  set  61  =  pM2  —  pw  and62  =  p^3  —  pMl.  Then 


In  the  above  two  formulas  fe  (£)  should  be  substituted  by  the  appropriate 
expression  for  the  probability  density;  the  expression  can  either  be  taken  from 
the  analysis  of  chapter  4  or  derivd  empirically. 

Having  computed  the  weight  w( £,  £)  f°r  the  entry  u  ve  update  the  z  value 
at  to  by  replacing  z  (u>)  with  the  maximum  of  the  current  value  and  the  weight. 
This  is  done  for  every  entry  u  in  the  neighborhood  of  the  hash  £  and  is  repeated 
for  every  hash  from  the  scene.  The  probe  is  concluded  by  summing  the  z  values 
associated  with  with  entries  with  equal  tags: 

Z{H)  =  J2z(lm,  [jl,j2,  •  •  •  ,jc,j]])  • 

3 

Here  'H  =  [m,  [ J i ,  J 2 ,  •  •  •  ,  jc]]-  The  hypotheses  with  the  top  few  Z  (7 i)  values  are 
candidates  for  interpretations  using  the  basis  B.  Figure  7.6  shows  a  summary  of 
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the  algorithm:  the  approximate-hypotheses  expressions  for  w  (•,  •)  and  z(-)  are  to 
be  used. 

We  should  mention  again  that  our  use  of  the  approximate-matching  hypothe¬ 
ses  implies  that  in  the  B<yesian  analysis  above  the  evidence  is  no  longer  inde¬ 
pendent.  In  fact  the  error  in  the  basis  features  can  be  deduced  from  a  sufficieit 
number  of  corroborating  hashes  involving  features  from  the  model  that  is  embed¬ 
ded  in  the  scene.  As  a  result  of  the  lack  of  independence  the  wines  we  compute 
give  only  an  approximate  measure  of  the  relative  a  posteriori  probabilities. 

We  already  stated  that  experimental  evidence  with  both  synthetic  and  real- 
world  data  supports  the  use  of  approximate-matching  hypotheses;  in  the  next 
chapter  ve  will  present  experimental  results  from  the  implementation  of  a  system 
that  makes  use  of  the  approximate-matching  approach  to  the  accumulation  of 
evidence. 
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Chapter  8 


Experimental  Results 


In  this  chapter  we  demonstrate  the  validity  of  the  Bayesian  geometric  hashing 
algorithm  that  uses  the  approximate  matching  hypotheses  (see  section  7.10).  In 
particular,  we  describe  in  detail  the  actual  implementation  of  a  complete  object 
recognition  system. 

The  recognition  system  is  implemented  on  a  8J7-processor  CM- 2  and  can 
recognize  objects  that  have  undergone  a  similarity  transformation  (i.e.  rotation, 
translation,  and  scaling),  from  a  library  of  32  models.  The  models  we  use  are 
military  aircraft  and  production  automobiles. 

We  test  the  approach  using  real-world  imagery;  in  particular,  the  test  inputs 
are  photographic  data  of  military  aircraft  in  flight,  and  of  automobiles  in  street 
scenes.  The  resulting  system  is  scalable,  works  rapidly  and  very  efficiently  on  an 
8J7-processor  machine,  and  the  quality  of  results  is  excellent. 

8.1  Off-Line  Preprocessing 

Since  our  intention  was  to  build  a  complete  object  recognition  system,  we  incor¬ 
porated  an  automatic  feature  extraction  mechanism.  A  straightforward  boundary 
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following  algorithm1  was  applied  to  the  output  of  the  edge  detection  stage,  fol¬ 
lowed  by  a  simple  divide-and-conquer  polygonal  approximation  algorithm  [27]. 
The  edge  detector  that  we  used  is  the  one  described  by  Cox  and  Boie  in  [13]. 2 
The  output  of  the  edge  detection  stage  was  a  collection  of  curves  (an  edge  map ), 
and  polygonal  approximations  for  each  of  those  curves  were  determined;  curves 
shorter  than  100  pixels  were  not  considered.  No  other  filtering  or  preprocessing 
was  performed. 

There  was  no  attempt  to  implement  the  edge  detection,  boundary  following 
and  feature  extraction  stages  in  parallel.  Indeed,  the  corresponding  code  is  serial 
and  runs  on  the  Front  End  (see  also  [81]).  Descriptions  of  very  fast  parallel 
algorithms  for  these  operations,  based  on  replicating  data  structures,  can  be  found 
in  [71,72], 

The  database  in  our  experiments  contained  thirty-two  models:  fourteen  of  the 
models  were  military  aircrafts,  whereas  the  remaining  eighteen  were  automobiles 
(six  automobiles  seen  from  three  different  viewpoints).  The  database  models 
were  allowed  to  undergo  similarity  transformations  (i.e.  rotation,  translation  and 
scaling). 

For  the  aircraft  models,  the  profile  drawings  of  fourteen  military  aircraft 
from  [77]  were  scanned  using  a  Microtek  300  color /grayscale  scanner.  These  draw¬ 
ings  are  not  photographs:  they  are  schematic  drawings  that  are  probably  drawn 
roughly  to  scale.  The  scanner  is  capable  of  a  resolution  of  300  dpi,  however,  we 
used  120  dpi  resolution  to  digitize  the  drawings. 

For  the  automobile  models,  we  obtained  photographs  of  six  different  automo- 

Mhe  boundary  following  algorithm  assumes  eight-connectivity. 

2The  value  of  the  filter’s  a  was  typically  equal  to  1.5  (see  [13]). 
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biles  seen  from  three  different  viewpoints:  the  camera  was  at  the  height  of  the 
automobiles’  midline,  with  its  optical  axis  pointing  at  the  middle  of  the  automo¬ 
biles’  long  side.  The  three  viewpoints  corresponded  to  azimuth  values  of  roughly 
—45,  0  and  +45  degrees.  The  average  distance  from  the  automobiles  was  approx¬ 
imately  fifty  feet.  The  photographs  were  subsequently  scanned  at  a  resolution  of 
75  dpi.  The  fourteen  aircraft  and  six  automobile  types  contained  in  our  database, 
appear  in  table  8.1. 


A-4  Skyhawk 
A-10  Thunderbolt 
F-15  Eagle 
F /A-18  Hornet 
Mig-23  Flogger 
Mig-31  Foxhound 
Sea  Harrier 

Chevrolet  Astro  (lateral) 

Chevrolet  Astro  (oblique  rear) 
Dodge  Dart  (oblique  frontal) 

Ford  Econolinel50  (lateral) 

Ford  Econolinel50  (oblique  rear) 
Chrysler  Horizon  (oblique  frontal) 
Honda  Civic  (lateral) 

Honda  Civic  (oblique  rear) 

Volvo  S.-W.  (oblique  frontal) 


A-6  Intruder 

F-14  Tomcat 

F-16  Falcon 

Mig-21  Fishbed 

Mig-29  Fulcrum 

Mirage  2000 

Panavia  Tornado 

Chevrolet  Astro  (oblique  frontal) 

Dodge  Dart  (lateral) 

Dodge  Dart  (oblique  rear) 

Ford  Econolinel50  (oblique  frontal) 
Chrysler  Horizon  (lateral) 

Chrysler  Horizon  (oblique  rear) 
Honda  Civic  (oblique  frontal) 

Volvo  S.  -  W.  (lateral) 

Volvo  S.  -  W.  (oblique  rear) 


Table  8.1  The  thirty-two  models  of  the  database. 


The  vertices  of  the  different  approximating  polygons  coincided  with  either 
points  of  discontinuity  in  the  tangent  direction  of  the  model’s  contour  (i.e. ,  vertices 
or  points  of  very  high  curvature),  or  points  of  maximum  curvature.  A  subset  of 
sixteen  points  was  selected  from  each  model’s  point  feature  set.  It  should  be 
stressed  that  this  model  feature  selection  is  carried  out  during  the  building  of  the 
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database  and  thus  performed  off-line.  Figure  8.1  shows  the  edge  maps  and  the 
selected  points  for  three  of  the  database  models:  the  F-16  Falcon ,  the  Sea  Harrier 
and  the  Ford  Econolinel50. 

Clearly,  more  sophisticated  approaches,  such  as  spline  fitting,  could  be  used  to 
determine  the  feature  set:  our  choice  for  the  feature  detection  mechanism  reflected 
our  desire  to  determine  the  limitations  of  the  proposed  object  recognition  system. 


8.2  The  Two-level  Randomized  Algorithm 


In  this  section,  we  describe  in  more  detail  the  probe  selection  mechanism,  which  is 
independent  of  whether  we  use  the  formulas  for  approximate  or  exact  matching. 

The  Bayesian  geometric  hashing  algorithm  necessitates  that  a  basis  probe  with 
c  members  be  selected  from  the  point  feature  set.  Given  such  a  probe,  hashes 
are  determined  for  all  of  the  remaining  S  —  c  image  point  features  and  ^-values 
are  computed  (see  Figure  7.6). 

The  main  component  of  our  probe  selection  algorithm  takes  a  straightforward 
approach:  the  basis  members  are  selected  without  replacement  and  uniformly 
from  the  point  feature  set.3  This  randomized  algorithm  is  both  simple  and  ef¬ 
ficient.  Indeed,  if  S  is  the  number  of  image  features,  n  the  number  of  model 
features,  and  6n  the  number  of  unoccluded  model  points,  the  probability  that 
s  selections  will  be  needed  before  we  encounter  a  basis  probe  consisting  only  of 
model  features  is  given  by  the  geometric  distribution  dg  (•,  •) 


dg 


s >  n 


0  n  —  i  +  1 

S-i  +  1 


n 


6  n  —  i  +  1 

S-i  +  1 


i-n 


6  n  —  i  +  1 

S-i  +  1 


5—1 


3However,  in  our  current  implementation  of  the  algorithm,  the  different  bases  are  selected  with 
replacement,  i.e.  a  given  basis  may  be  selected  more  than  once  before  recognition  occurs. 
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For  6  =  1,  n  =  16,  S  =  150,  and  c  =  2  we  can  see  that  a  probe  consisting 
entirely  of  model  features  will  be  encountered  with  probability  0.99  after  about 
30  probes  have  been  considered.  This  is  very  useful  since  the  required  number  of 
probes  represent  roughly  0.1%  of  the  22,350  possible  two-member  probes. 

Clearly,  the  algorithm  will  frequently  select  either  a  poor  basis,  or  a  basis  where 
at  least  one  of  the  c  members  does  not  belong  to  the  model  that  is  embedded  in 
the  test  image.  In  both  cases,  the  probe  will  be  discarded  after  evidence  from  all 
S  —  c  hashes  has  been  accumulated.  However,  it  is  possible  to  discard  such  a 
probe  after  having  considered  only  a  fraction  of  the  image  features. 

Let  us  assume  that  a  given  probe  consists  entirely  of  model  features.  Let  us 
further  assume  that  we  only  consider  a  fraction  k  of  the  remaining  image  features 
S  —  c  by  uniformly  selecting  without  replacement  among  them:  in  other  words,  we 
determine  hashes  and  z- values  for  only  (S  —  c)  / k  image  features.  The  probability 
that  precisely  l  features,  with  l  <  (On)  —  c,  out  of  the  selected  (S  —  c)/k  belong 
to  the  embedded  model  is  given  by  the  hypergeometric  distribution  (•,  •,  •) 


Then,  the  probability  that  at  least  l  points  of  the  embedded  model  belong  to  the 
selected  fraction  of  image  features  is 


If  we  know  the  average  contribution  zav  of  a  hash,  and  if  max{Z  (•,•)}  <  l  ■  zaV} 
then  we  can  discard  the  current  probe  at  this  point,  and  select  another  one.  If  on 
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the  other  hand  max{Z  (•,•)}  >  l  ■  zav,  we  proceed  and  accumulate  evidence  from 
the  remaining  (k  —  1)  {S  —  c)  /k  image  features  as  well. 

One  can  think  of  the  1/P/  value  as  the  expected  slowdown  that  results  from 
the  fact  that  only  a  fraction  of  image  features  is  used.  In  order  for  the  modified 
probing  algorithm  to  be  useful,  the  effective  speedup  must  be 

k-Pi>  1  . 

The  value  of  this  last  expression  is  a  function  of  k,  /,  S,  6  and  n.  In  Figure  8.2 
we  show  some  of  the  contours  of  the  expression  for  6  =  1,  n  =  16,  S  =  150 
and  c  =  2,  as  a  function  of  l  and  k.  The  contour  labels  correspond  to  the 
value  of  the  effective  speedup.  We  can  see  that  the  maximum  effective  speedup 
occurs  when  we  consider  only  one-seventh  of  the  image  features.  But  when  only 
a  small  fraction  of  the  image  features  is  considered,  the  probability  of  a  large 
number  of  model  features  occurring  in  the  selected  set  of  features  is  very  small. 
Other  combinations  of  k  and  l  are  more  preferable:  in  particular,  if  we  require 
that  at  least  5  model  features  occur  in  the  selected  set  of  features,  theoretically 
we  can  achieve  an  expected  speedup  of  1.8,  and  we  do  not  need  to  consider 
more  than  one  third  of  the  entire  set  of  features.  During  the  recognition  process, 
and  after  (S  —  2)  / 3  features  have  been  considered,  we  examine  the  largest  value 
Z  (•,  •):  if  the  value  is  less  than  5  times  the  average  contribution  zav  that  a  hash 
contributes  to  any  hypothesis,  we  discard  the  current  probe,  and  select  another 
one;  otherwise,  we  accumulate  evidence  from  the  remaining  2 (S  —  2) / 3  image 
features  and  proceed  as  usual.  We  have  incorporated  this  two-level  randomized 
algorithm  into  our  recognition  system,  with  k  =  3  and  1  =  5:  in  all  cases  where 
6  =  1 ,  we  observed  a  speedup  roughly  equal  to  1.7  over  the  straightforward 
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algorithm  which  accumulates  evidence  from  all  the  image  features. 


2  4  6  8  10  12  14  16 

fraction  of 
image  features  (k) 

Figure  8.2  Several  of  the  contours  for  the  effective  speedup  function. 
The  horizontal  axis  corresponds  to  the  fraction  of  the  image  features 
that  is  considered  by  the  probe  selection  algorithm.  The  vertical  axis 
corresponds  to  the  least  number  of  model  features  that  one  expects 
to  see  in  the  selected  subset.  The  different  contours  correspond  to 
the  values  of  the  effective  speedups,  and  were  taken  at  heights  1.0, 
1.8,  2.0,  2.5,  3.0,  3.5,  4.0  and  4.36  respectively. 


Another  helpful  heuristic  which  we  incorporated  in  the  probing  algorithm  is 
the  following:  a  basis  tuple  may  be  discarded  if  the  length  of  any  of  the  basis 
components  is  too  small  or  too  big;  the  cut-off  values  for  our  implementation 
were  150  and  550  pixels  respectively.  Further  experimentation  indicated  that  for 
scenes  of  approximately  100  points  the  resulting  algorithm  requires  approximately 


167 


60  probe  selections  before  recognition  occurs. 


8.3  Results 

In  this  section,  we  describe  the  experimental  results  for  our  implementation  of  the 
Bayesian  geometric  hashing  algorithm  described  in  Figure  7.6.  The  expressions 
used  for  «;(•,•)  and  z(-)  are  the  ones  corresponding  to  the  approximate  match¬ 
ing  approach  (see  section  7.10).  The  database  models  were  allowed  to  undergo 
similarity  transformations  (i.e.  rotation,  translation  and  scaling).  In  the  formulas 
that  we  used,  fe  (•)  corresponds  to  the  case  of  Gaussian  distributed  point  features 
(see  Eqn.  4.4).  The  probe  selection  algorithm  used  is  the  two-level  randomized 
algorithm  that  was  described  in  section  8.2.  All  of  our  experiments  were  run  on 
an  8J7-processor  Connection  Machine. 

In  order  to  test  the  aircraft  models,  we  selected  a  number  of  photographs  of 
the  same  aircraft  type  as  our  models,  but  from  a  different  source  [94],  The  pho¬ 
tographs  were  chosen  on  the  basis  of  being  taken  from  approximately  the  same 
viewpoint  as  the  drawings  in  the  model  database.  That  is,  since  the  model  draw¬ 
ings  are  side  views,  and  since  our  implementation  uses  only  similarity  invariance, 
recognition  will  only  be  possible  with  views  taken  generally  from  the  side.  No¬ 
tably,  finding  such  photographs  is  not  easy,  since  the  pictures  must  be  taken  by 
chase-planes.  However,  we  emphasize  that  the  test  images  are  real  photographs, 
and  not  drawings  nor  simulated  data.  Nor  are  the  models  taken  from  the  same 
source  as  the  photographs.  The  only  thing  that  the  test  images  and  the  model 
database  have  in  common,  other  than  the  approximately  similar  viewpoints,  is 
the  aircraft  types. 

To  test  the  automobile  models,  we  obtained  additional  photographs  of  auto- 
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mobiles.  The  automobiles  in  our  test  photographs  were  from  various  locations  in 
New  York  City.  The  only  thing  that  the  automobiles  in  the  test  photographs  and 
those  used  to  build  our  database  have  in  common,  other  than  the  approximately 
similar  viewpoint,  is  the  automobile  type. 

All  the  test  photographs  were  digitized  using  an  uncalibrated  CCD  camera. 
The  result  was  that  distortions  and  warpings  were  introduced,  not  only  from  the 
perspective  projection  of  the  3-D  plane  onto  the  photograph,  but  also  from  the 
digitization  process.  However,  such  distortions  might  be  typical  of  a  working 
vision  system,  and  all  such  distortions  are  approximable  by  a  similarity  trans¬ 
formation.  Edges  were  extracted  from  the  resulting  gray-level  images,  using  the 
same  edge  detector  that  was  also  used  during  the  building  of  the  database.  Again, 
no  preprocessing  or  other  filtering  of  the  test  images  was  performed.  A  polygonal 
approximation  of  the  different  edge  map  curves  provided  the  points  of  the  fea¬ 
ture  set.  Figures  8.3  through  8.8  show  the  digitized  photographs  for  three  of  our 
test  inputs  together  with  the  corresponding  edge  maps  and  the  extracted  point 
features. 

In  Figure  8.3,  we  can  see  that  the  original  photograph  of  the  F-16  was  taken 
with  the  camera  positioned  below  the  airplane’s  midline  and  towards  the  back  of 
the  aircraft.  Further,  the  airplane  is  banking  to  the  left.  Also,  notice  that  the 
F-16  in  the  test  photograph  is  a  two-seat  trainer,  unlike  the  model  contained  in 
our  database. 

The  original  photograph  of  the  Sea  Harrier  (Figure  8.5)  was  very  small;  a 
juxtaposed  pencil  helps  estimate  the  actual  size  of  the  original.  The  original 
picture  was  taken  with  the  camera  positioned  in  the  front  of  the  aircraft,  as 
evidenced  by  the  visible  interior  of  the  left  engine  intake.  The  airplane  appearing 


169 


at  the  bottom  of  the  photograph  is  a  Hunter  T-8M. 

The  photograph  of  the  Ford  Econolinel50  was  taken  from  the  driver’s  side  of 
the  automobile,  unlike  that  of  the  model  which  was  taken  from  the  passenger’s 
side.  Instead  of  augmenting  our  database  with  entries  corresponding  to  that 
viewpoint,  we  decided  to  reverse  the  test  input  and  use  its  reflection  around  the 
vertical  axis  as  a  test  input;  this  explains  why  the  lettering  on  the  front  door 
of  the  vehicle  appears  reversed.  Note  that  the  current  recognition  system  is  not 
invariant  to  left/right  reflection. 

In  Figures  8.9  through  8.13  we  show  the  output  of  our  system’s  implementa¬ 
tion,  for  three  test  inputs.  The  retrieved  database  model  appropriately  scaled, 
rotated,  and  translated  is  shown  overlaid  on  the  test  input.  In  the  bottom  half 
of  each  screendump,  the  nine  top-retrieved  database  models  are  shown  in  order 
of  decreasing  accumulated  evidence  (column-major  order).  For  each  of  the  nine 
models,  its  name,  and  the  retrieved  basis  are  also  indicated.  The  point  features 
corresponding  to  each  basis  are  marked  along  the  contour  of  the  corresponding 
model.  Above  each  model,  bars  providing  a  length  encoding  of  the  evidence 
that  the  indicated  model/basis  combination  has  accumulated,  are  also  shown.  It 
should  be  noted  here  that  the  recovery  of  the  transformation  was  based  solely  on 
the  basis  pair,  and  not  on  a  best  least-squares  match  between  all  the  correspond¬ 
ing  model  and  scene  feature  pairs. 

As  stated  in  these  figures,  approximately  24.0  milliseconds  are  required  per 
probe  per  scene  point.  This  is  a  consequence  of  the  less  than  optimal  use  of  the 
floating  point  hardware  in  the  CM- 2  model  (hypercube  model  of  computation).4 

4Considerable  speedup  is  possible  with  the  SPRINT- chip  model  of  computation,  at  the  ex¬ 
pense  of  requiring  64-bit  floating  point  hardware. 
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Figure  8.3  A  test  image  for  the  recognition  algorithm:  the  photo¬ 
graph  of  an  F-16. 


Figure  8.4  The  edge  map  extracted  by  the  Cox-Boie  edge  detector 
(the  value  of  a  was  2.0)  for  the  F-16  test  image.  Also  shown  are  the 
80  automatically  extracted  features. 
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Figure  8.6  The  edge  map  extracted  by  the  Cox-Boie  edge  detector 
(the  value  of  a  was  2.0)  for  the  Sea  Harrier  test  image.  Also  shown 
are  the  169  automatically  extracted  features. 
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Figure  8.7  The  test  image  of  a  Ford  Econolinel50. 
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Up  to  thirty-two  models  can  be  included  in  the  database,  without  incurring  any 
additional  requirements  in  processing  time.  As  a  rule  of  thumb,  a  database  con¬ 
taining  32 k  models  will  incur  an  almost  Avfold  increase  in  the  above  processing 
time  requirements  (the  number  of  processing  elements  is  assumed  fixed  and  equal 
to  8K).  Approximately  linear  speedup  can  be  achieved  on  a  larger  Connection 
Machine  (see  section  3.8). 

In  all  our  experiments,  the  true  model/basis  combination  was  discovered  as  the 
pair  with  the  largest  weighted  vote,  i.e.,  evidence.  However,  even  if  the  correct 
model  were  not  found  as  the  maximum  winner,  it  is  assumed  that  a  postprocess¬ 
ing  stage  would  be  used  to  verify  a  number  of  possible  matches.  For  example, 
with  our  database  of  thirty-two  models,  there  are  7680  possible  model/basis  com¬ 
binations.  If  the  first  nine  or  so  matching  model/basis  combinations  from  the 
hashing  algorithm  are  checked,  we  have  still  achieved  a  considerable  speedup  over 
the  alternative  of  checking  all  possible  matchings.  Accordingly,  the  fact  that  the 
accumulated  evidence  for  the  ninth  model/basis  combination  is  considerably  less 
than  the  evidence  for  the  winning  model/basis  combination  indicates  that  the 
method  is  robust. 
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1:  F16  (  3,  5) 


4:  dart .1  (  8,15) 


7:  Tornado  (  2, 


2:  F16  (13,15) 


5:  SeaHarrier  (  4,  5L 


Mig31  (  6,  9) 


3:  F15  (16,  5)2 


6:  A6  (  1.  3) 


Figure  8.9  The  output  of  the  implementation  of  our  system  on  the 
Connection  Machine.  The  test  input  (F-16)  is  shown  on  the  top  left. 
The  edge  map  together  with  the  automatically  extracted  point  fea¬ 
tures  is  shown  on  the  top  right;  the  basis  selection  that  led  to  recog¬ 
nition  is  also  marked.  A  total  of  22  basis  selections  was  required,  and 
the  elapsed  time  was  40.5  seconds  (NB.  this  figure  does  not  include 
the  edge  detection  and  feature  extraction  stages).  The  bars  above 
each  of  the  9  top  retrieved  models  provide  a  length  encoding  of  the 
total  accumulated  evidence  for  the  corresponding  model/basis  com¬ 
bination.  The  retrieved  database  model  appropriately  scaled,  rotated 
and  translated  is  shown  overlaid  on  the  test  input. 
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Figure  8.10  The  F16  test  input  with  the  retrieved  model  overlaid  on 
it.  The  recovered  transformation  (rotation,  translation  and  scaling) 
was  based  solely  on  the  basis  pair,  and  not  on  a  best  least-squares 
match  of  all  corresponding  feature  pairs. 
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Figure  8.11  The  output  of  the  implementation  of  our  system  on 
the  Connection  Machine.  The  test  input  (Sea  Harrier)  is  shown  on 
the  top  left.  The  edge  map  together  with  the  automatically  extracted 
point  features  is  shown  on  the  top  right;  the  basis  selection  that  led  to 
recognition  is  also  marked.  A  total  of  4  basis  selections  was  required, 
and  the  elapsed  time  was  15.7  seconds  (NB.  this  figure  does  not  in¬ 
clude  the  edge  detection  and  feature  extraction).  The  bars  above 
each  of  the  9  top  retrieved  models  provide  a  length  encoding  of  the 
total  accumulated  evidence  for  the  corresponding  model/basis  com¬ 
bination.  The  retrieved  database  model  appropriately  scaled,  rotated 
and  translated  is  shown  overlaid  on  the  test  input. 
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Figure  8.12  The  Sea  Harrier  test  input  with  the  retrieved  model 
overlaid  on  it.  The  recovered  transformation  (rotation,  translation 
and  scaling)  was  based  solely  on  the  basis  pair,  and  not  on  a  best 
least-squares  match  of  all  corresponding  feature  pairs. 
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Figure  8.13  The  output  of  the  implementation  of  our  system  on  the 
Connection  Machine.  The  test  input  (Ford  EconolineldO)  is  shown  on 
the  top  left.  The  edge  map  together  with  the  automatically  extracted 
point  features  is  shown  on  the  top  right;  the  basis  selection  that  led  to 
recognition  is  also  marked.  A  total  of  4  basis  selections  was  required, 
and  the  elapsed  time  was  9.1  seconds  (NB.  this  figure  does  not  in¬ 
clude  the  edge  detection  and  feature  extraction).  The  bars  above 
each  of  the  9  top  retrieved  models  provide  a  length  encoding  of  the 
total  accumulated  evidence  for  the  corresponding  model/basis  com¬ 
bination.  The  retrieved  database  model  appropriately  scaled,  rotated 
and  translated  is  shown  overlaid  on  the  test  input. 
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Figure  8.14  The  Ford  Econoline  150  test  input  with  the  retrieved  model 
overlaid  on  it.  The  recovered  transformation  (rotation,  translation 
and  scaling)  was  based  solely  on  the  basis  pair,  and  not  on  a  best 
least-squares  match  of  all  corresponding  feature  pairs. 
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Chapter  9 


Conclusion 

9.1  Summary  of  Results 

This  dissertation  makes  three  principal  contributions.  First,  the  exploitation  of 
parallelism  in  object  recognition  is  advanced.  Two  parallel  algorithms  that  realize 
the  geometric  hashing  method  are  presented.  One  algorithm  is  designed  for  an 
SIMD  hypercube-based  machine;  the  other  algorithm  is  more  general,  and  relies 
on  data  broadcasting  capabilities. 

The  first  of  the  two  algorithms  is  data  parallel  over  the  hash  table  entries  and 
regards  geometric  hashing  as  a  connectionist  algorithm  with  information  flowing 
via  patterns  of  communication.  The  second  algorithm  is  inspired  by  the  method 
of  inverse  indexing  for  data  retrieval  and  treats  the  parallel  architecture  as  a 
source  of  “intelligent  memory.”  The  algorithm  is  data  parallel  over  combinations 
of  small  subsets  of  model  features. 

Per  probe  of  a  candidate  basis,  and  using  M  (”)  ( n  —  c)c\  processors,  the  first  of 
the  algorithms  has  time  complexity  C  (log  (S'Mn)  log  (Mn)),  whereas  the  second 
has  time  complexity  O  {S  +  log  (Mu));  M  is  the  number  of  database  models,  n  is 
the  number  of  point  features  per  model,  c  is  the  cardinality  of  the  basis  tuple,  and 
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S  is  the  number  of  extracted  scene  features.  The  model  of  parallel  computation 
is  the  concurrent-read-exclusive-write  (CREW)  SIMD  Hypercube. 

The  implementations  of  these  two  algorithms  on  a  Connection  Machine  allow 
the  rapid  recognition  of  models  consisting  of  patterns  of  points,  embedded  in 
scenes  of  several  hundred  points,  independent  of  rotation,  translation  or  scale 
changes,  and  using  databases  of  thousands  of  models.  With  1,024  synthetic 
models  each  consisting  of  16  point  features,  and  scenes  with  200  point  features, 
we  achieve  probe  times  of  about  250  milliseconds  on  a  32JWprocessor  CM- 2. 

A  number  of  enhancements  to  the  geometric  hashing  method  (such  as  hash 
table  equalization,  and  the  use  of  hash  table  symmetries),  which  were  developed 
specifically  for  the  parallel  algorithms,  are  also  presented.  These  techniques  lead 
to  substantial  performance  improvements  and  are  also  applicable  to  more  general 
implementations  of  indexing-based  object  recognition  methods. 

A  second  contribution  of  this  dissertation  is  an  analysis  of  the  expected  dis¬ 
tribution  of  computed  invariants  over  the  space  of  invariants,  and  a  related  noise 
sensitivity  analysis.  In  particular,  formulas  for  the  expected  distributions  of  com¬ 
puted  invariants  over  the  hash  space  are  derived  for  the  cases  of  rigid,  similarity 
and  affine  transformations,  and  for  two  different  distributions  (Gaussian  and  Uni¬ 
form  over  a  disc)  of  point  features  in  the  model  database.  For  the  noise  analysis, 
formulas  that  describe  the  dependency  of  the  values  for  the  computed  invariants 
on  Gaussian  positional  error  are  derived  for  the  similarity  and  affine  transforma¬ 
tion  cases.  The  basic  underlying  assumption  here  is  that  the  positional  accuracy 
of  the  extracted  features,  which  is  subject  to  sensor  noise  and  errors  introduced 
during  the  feature  extraction  stage,  can  be  modeled  by  a  Gaussian  process. 

Finally,  the  third  and  most  important  contribution  of  this  dissertation  is  an 
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interpretation  of  geometric  hashing  that  allows  the  algorithm  to  be  viewed  as  a 
Bayesian  approach  to  model-based  object  recognition.  This  interpretation,  which 
is  a  new  form  of  Bayesian-based  model  matching,  leads  to  well-justified  formu¬ 
las,  and  gives  a  precise  weighted- voting  method  for  the  evidence-gathering  phase 
of  geometric  hashing.  These  formulas  replace  traditional  heuristically-derived 
methods  for  performing  weighted  voting,  and  also  provide  a  precise  method  for 
evaluating  uncertainty. 

A  prototype  object  recognition  system  has  been  built  using  the  above  ideas, 
and  is  implemented  on  a  CM- 2  Connection  Machine.  The  system  is  scalable  and 
can  recognize  models  subjected  to  2D  rotation,  translation,  and  scale  changes  in 
digital  imagery.  The  models  for  the  object  database  were  obtained  from  readily 
available  sketches  of  military  aircraft  and  production  automobiles.  Unlike  military 
aircraft,  which  tend  to  be  elongated  with  distinguishing  marks,  automobiles  have 
more  rotationally  symmetric  shapes  making  the  recognition  task  potentially  more 
difficult.  Point  features  based  on  curvature  extrema  and  singularities  of  extracted 
curves  were  used  to  build  the  database.  The  test  inputs  to  the  system  are  real- 
world,  black  and  white  photographs  of  airplanes  in  flight,  and  of  street  scenes. 
The  source  of  the  test  imagery  is  distinct  from  that  of  the  model  images. 

The  object  recognition  system  with  enhancements  and  Bayesian  reasoning  is 
then  used  to  locate  model  objects  in  the  scenes,  currently  using  an  8A"-processor 
Connection  Machine.  We  obtain  extremely  good  results  with  similarity-invariant 
model  matching.  Currently,  the  model  database  consists  of  32  objects,  and  the 
scenes  typically  contain  a  little  over  a  hundred  extracted  point  features. 

This  system  is  the  first  system  of  its  kind  that  is  scalable,  uses  large  databases, 
can  handle  noisy  input  data,  works  rapidly  on  an  existing  parallel  architecture, 


186 


and  exhibits  excellent  performance  with  real  world,  natural  scenes. 

9.2  Future  Research  Directions 

We  end  this  dissertation  with  a  brief  mention  of  possible  future  research  directions. 

Currently,  in  the  context  of  Bayesian  geometric  hashing  and  for  a  given  ba¬ 
sis  selection,  the  model/basis  combination  accumulating  the  largest  evidence  is 
retained  or  rejected  based  on  empirically  determined  thresholds.  An  analysis  in 
the  spirit  of  [37]  will  allow  the  determination  of  adaptive  thresholds  that  are  a 
function  of  the  complexities  of  the  viewed  scene  and  the  stored  models. 

Another  topic  to  be  explored  is  the  intelligent  grouping  of  features.  Currently, 
a  straightforward  randomized  algorithm  selects  candidate  basis  tuples  in  turn, 
until  recognition  is  achieved.  Methods  for  grouping  image  features,  as  well  as 
their  realization  in  a  parallel  setting,  remain  largely  unexplored  and  are  expected 
to  greatly  expedite  the  search. 

Finally,  the  use  of  higher-level  image  features  (i.e.  features  other  than  points) 
for  performing  object  recognition  in  the  context  of  Bayesian  geometric  hashing 
remains  largely  unexplored.  This  will  require  extending  the  technique  and  de¬ 
signing  evaluation  criteria  for  measuring  the  relative  merit  of  the  different  feature 
types.  Related  to  this  topic,  is  the  issue  of  the  development  of  a  control  mecha¬ 
nism  that  will  permit  the  recognition  stage  of  the  system  to  selectively  guide  the 
extraction  of  various  features  as  the  recognition  process  progresses. 
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Appendix  A 

Some  Details  Regarding  the 
Derivation  of  Eqn.  6.6 
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In  order  to  express  X  and  Y  as  functions  of  U  and  V,  we  solve  Eqns.  6.3  and  6.4 
for  X  and  Y ;  then  using  Eqns.  A. 4  and  A. 5,  we  can  show  that 


A  =  Z  +  C 

(A.7) 

y  =  w  +  V 

(A.8) 

where 

z  = 

u(x2  -  X 0  -v(y2-  y1)  +  Xl  +  *2 

(A.9) 

c  = 

-  [(U  -  u)  (x2  -  xx)  -  (V  -  v)  (y2  -  yi)] 

(7 

(A- 10) 

w  = 

yA  +  u 

u(y2-y1)  +  v  (x2  -  ao  + 

(A- 11) 

V  = 

-  [(U  -  u)  (y2  -  yO  +  (V  -  v)  (x2  -  aq)]  • 

(7 

(A- 12) 

Eqn.  6.5  can 

then  be  rewritten  as 

f(U,V)  = 

1 

r 

(  ((Z  +  C)2  +  (W  +  vf)2\ 

_ 

/  exp 

J  R4 

(M 

b 

CO 

00 

1 

2 

V  / 

•exp  (4  <A±A> i)  .  exp  (4<A±A> I)  (A.13) 

•  [((jAh  +  X2  ~  O X\  —  X\ )  + 

(^2  +  y2  -  vYi  -  yif]  dX1dX2dy1dy2  . 


Substitution  of  expressions  A. 9  through  A.  12  in  A.13,  and  integration  of  the  result 
yields  the  expression  in  Eqn.  6.6.  □ 
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